[BioC] Nimblegen data: oligo, affy, limma?

Fri Sep 1 14:32:06 CEST 2006

Quoting Sean Davis <sdavis2 at mail.nih.gov>:

> On Friday 01 September 2006 06:47, J.delasHeras at ed.ac.uk wrote:
>> I have been using limma for a little while, for the analysis of
>> 2-colour cDNA arrays.
>>
>> I am going to get pretty soon some data from Nimblegen. This will be on
>> their promoter arrays, hybridised with some ChIP samples. I understand
>> it follows a similar format to Affymetrix, but they use 2-colour hybs.
>>
>> I'm wondering as to the best way to analyse these. I checked the BioC
>> archive for "nimblegen" and I got a couple of things to consider, but I
>> am still not sure.
>>
>> Initially I thought that 'affy' would be the way to go, but since it's
>> a 2-colour hyb, I suppose that 'limma' could handle it. In fact, from
>> limma I could choose to treat the data as single channel arrays if I
>> had to, and I am already familiar with limma. The things I have to
>> consider is what method to normalise the data. I read that Loess might
>> not be such a good idea for this type of data (ChIP on promoter
>> arrays), and perhaps Aquantile would be best. I don't know. I'll have
>> to check further. Any pointers greatly appreciated.
>>
>> Then I found that there's a package somewhere (didn't see it in BioC)
>> called 'oligo' that seems to support Nimblegen data, so I would like to
>> look into that too. Again, any comments as to how useful this is,
>> especially in comparison with limma, would be great.
>
> Nimblegen arrays are actually more similar to two-color arrays, in many
> respects (at least for the chip-chip application).  The manufacturing process
> is similar to Affy (light-directed synthesis), but for the chip-chip
> application, you can think of them as two-color arrays.  The file formats for
> nimblegen are also similar to many two-color platforms (tab-delimited text)
> and so can be easily read using read.table() and the like.  As for
> normalization, that will depend on the analysis method that you are using, to
> some extent.  Do you need a "between-array" normalization or not?  Do you
> require that a "center" for each array be specified for the purposes of
> analysis?  Finally, the actual biologic situation may make a difference.
> Pull-downs of things like histone markers tend to produce very strong signals
> while pulldowns of early developmental transcription factors in
> differentiated cells produce few signals that are not strong.
>
> As for the oligo package, it is available in the development archive
> (bioc-1.9).
>
> Sean

Thanks for that Sean.

Yes, the more I think about it, it makes more sense to treat them as 
straight 2-colour arrays (for expression ones I guess I would take 
advantage of the Affy-design).

I'm not sure if I need between array normalisation. I will have to play 
with the data and explore to see what needs to be done.

The IPs are from proteins that will be nowhere as abundant as histones, 
but not too rare (at least that's why I expect!), but I should have 
quite a lot of "near background" signals.

A center does not need to be specified, but it may make things easier. 
I don't really need to compare levels between regions, it's more a 
question of finding regions where there's substantial presence of my 
proteins, compared to the general population. I expect these proteins 
to correlate to some extent with silenced genes. I have expression data 
from other experiments to verify how this correlates (or not). The idea 
is that at least we should not find them at the promoter of active 
genes.

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK