[BioC] General question about library files

James W. MacDonald jmacdon at med.umich.edu
Mon Aug 18 15:10:07 CEST 2008


Hi John,

John O. Woods wrote:
> Hi everyone,
> 
> This is more of a general question. I'm fairly new to array analysis
> (jumping right into the deep end here, looking at whole-genome tiling
> arrays), and I'm having trouble sorting out in my head exactly what
> data is stored in each Affy filetype.
> 
> It seems obvious that the CEL files contain the raw intensities from
> the arrays themselves. Still, I'm not sure how these CELs are
> organized--is it one CEL per chip? How do I know which metadata files
> match with a specific CEL?

Yes, one celfile contains data from one chip. You can get header 
information from the celfile using readCelHeader() in affxparser:

headerinfo <- readCelHeader(celfilename)


> 
> I also see that the BPMAP files contain design information for the
> arrays. What I'm less clear on is why these have genome builds in the
> names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes
> an earlier build available (v34, I think). The probes are, of course,
> the same (right?). Thus, does it matter to Bioconductor which build
> I'm using?

Well, the probes are mapped to the genome based on whatever build you 
are using. Since the genome is still pretty fluid, the mapping from 
probe to genome location may change from build to build.

> 
> I'm much less clear on CIFs and CDFs. How do these differ, and what
> information do they contain? Affymetrix provides only very vague
> descriptions on its website: "The CDF file describes the layout for an
> Affymetrix GeneChip array." Gee, thanks. How does that differ from a
> BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of
> CDFs?

The cdf is an older file format that Affy appears to be migrating away 
from. It really only gave mappings from (x, y) coordinates to probeset 
ID, whereas the bpmap and clf files contain more information. Since Affy 
doesn't support the cdf file format for a lot of the new chips, 
makePdInfoBuilder uses the supported format.

Best,

Jim



> 
> I've been looking for a good resource to help me get a handle on this
> stuff. I see lots of tutorials and stuff for analyzing microarrays,
> but little for tiling arrays (yay cutting edge). Anyone have any
> pointers?
> 
> Thanks so much for the help.
> 
> Cheers,
> John Woods
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662



More information about the Bioconductor mailing list