[BioC] Using ReadAffy with custom CDFs on tiling array data

Naira Naouar nanao at psb.ugent.be
Mon Jul 28 15:57:53 CEST 2008


There are now no CDF available for tiling arrays except the ones 
provided by Manhong Dai
You will see that there is one CDF available by tiling array and by 
database which was used as reference for genome annotation. Depending on 
the database that you trust more for genes annotation, you will choose 
the "unique" CDF that you need for your analysis. (with those CDF you 
will be able to perform RMA, ...).

Personally, I have been working on Arabidopsis Thaliana 1.0R tiling 
array and I have produced my own CDF for this array. The way I did it is 
explained here: 

Basically, I started from all probes that I aligned to the genome and I 
eliminated the probes which were not of interest for me (keeping only 
the unique exonic probes for each gene annotated).
My CDF contains more genes than the one proposed by Manhong Dai (I am 
not 100% sure on the way he used to select the correct probes for each 

My last comment for the moment will be that it will be very difficult to 
analyse all your arrays together.
you will realize that it is taking a lot of memory for the storage.

If I can be of any other help, please let me know,

Arkady wrote:
> A couple of questions herein.
> Background: I'm trying to load the CEL files for the Affy whole-genome
> tiling arrays. I have lots and lots of bzipped2 CEL files (3452 of
> them). They seem to ask for Wgc_Universal_fe1 as the cdf, and this
> package does not appear to be available through Bioconductor,
> according to getCDF(cleancdfname("Wgc_Universal_fe1")).
> According to some papers I've found, newer custom CDFs are better. So
> I tried using some from UMich, but again, they don't appear to be
> available in the repository (at least for human tiling 1.0R and 2.0R).
> Finally, I downloaded all of the probe and CDF data from UMich and
> installed it manually, both the probe and cdf packages. That appeared
> to work, and I can load a single CEL file.
> Unfortunately, this has left me with several questions.
> 1. The CEL files contain the names of the original CDFs. How do I
> translate those to the names of the custom CDFs? Is there some way to
> establish a mapping?
> 2. How do I deal with multiple CDFs for a single experiment? Do I load
> each of my 3452 files separately, specifying the CDF each time?
> 3. What about the probe packages? Is there a unified package that
> contains both pieces (CDF and probes) of information?
> 4. Why aren't the CDFs for the human tiling arrays made available
> through Bioconductor?
> Thanks again.
> Cheers,
> John Woods
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Naira Naouar 

Tel:+32 (0)9 331 38 63
VIB Department of Plant Systems Biology, Ghent University
Technologiepark 927, 9052 Gent, BELGIUM
nanao at psb.ugent.be                         http://www.psb.ugent.be

More information about the Bioconductor mailing list