[BioC] makecdfenv package bug

James W. MacDonald jmacdon at uw.edu
Tue Jun 3 18:56:44 CEST 2014


Hi Danny,

Depends on what you think is 'easy'. ;-D

Note that the celfiles are read in row by row. This includes all the 
poly-A 'landing lights' that the scanner uses to figure out how to align 
the camera. Also note that there is a function in affy called 
read.probematrix(), which just reads the data from multiple celfiles 
into a matrix, where the row of the matrix corresponds to the 'index' 
location of the probe on the array.

The mapping file you reference has this sort of data:

AT1G01010_at    1       +       3783    1193    1062    NA
AT1G01010_at    1       +       3888    980     824     NA
AT1G01010_at    1       +       4015    927     176     NA
AT1G01010_at    1       +       4195    1542    695     NA
AT1G01010_at    1       +       4525    1527    762     NA
AT1G01010_at    1       +       4712    760     830     NA
AT1G01010_at    1       +       4789    1239    6       NA
AT1G01010_at    1       +       4860    626     38      NA
AT1G01010_at    1       +       5009    1021    7       NA

Where the columns are (in order)

Probe ID
Chr
Strand
Start
X
Y
Probeset name (which is NA, as there are no probesets)

So you have the (x,y) coordinates of the probes on the chip, and where 
the probes are in the genome, but when you read the data in you just 
have the index position. So you need to convert the (x,y) coordinates to 
the index positions.

There is a function in affy called xy2indices that you can use to 
convert things. All you need to know is the number of columns in the 
array, which you can get from read.celfile.header().

So you could hypothetically read in the data using read.probematrix(), 
normalize using (probably easiest) normalizeBetweenArrays() from limma, 
convert the (x,y) probe locations to indices, merge things 
appropriately, and then if you want to be really cool, put all that into 
a GRanges object so you can use things like Gviz to make sweet plots.

Best,

Jim



On 6/3/2014 11:07 AM, Danny Arends wrote:
> Hey James,
>
> Thanks for your answer, I'll look into your suggestions...
>
> However just to be sure, is there an 'easy' hack to get the probes out
> of the CDF file and match them to the CEL file information?
>
> I have available found the following files that describe the array:
> cdf.gz
> desc.txt.gz
> mapping.txt.gz
> probe_tab.txt.gz
>
> Just getting the probe locations or sequences is enough, then I could
> start the analysis myself
> (either by mapping probes to the reference using blast, or using the
> supplied locations),
>
> I was hoping that I could use the Affy package for normalization of
> probe intensities, etc
>
> Gr,
> Danny
>
>
>
> 2014-06-02 21:49 GMT+02:00 James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>>:
>
>     Hi Danny,
>
>
>     In general, you don't use the makecdfenv/affy pipeline for tiling
>     arrays, as there aren't (to my knowledge) any probesets. Instead,
>     there are just probes, tiled along the genome.
>
>     The affy package is predicated upon the idea that a set of probes
>     are all grouped into a probeset, which is intended to measure the
>     expression of a transcript. Since the tiling arrays are completely
>     different, the two don't really mix.
>
>     Normally I would point you to the oligo package, but you need to
>     build a pdInfoPackage, which expects a bpmap file, not a cdf. In
>     addition, I tried to read in the cdf that you can get from GEO using
>     readCdfUnits() from affxparser, and it consistently segfaulted, so
>     there might be a problem with the cdf itself.
>
>     Looking around, it appears you might be better served by using
>     either aroma (http://www.aroma-project.org/__), which is supposed to
>     handle tiling arrays (but since aroma uses affxparser, maybe it
>     won't work).
>
>     Or you could try Affy's software:
>
>     http://www.affymetrix.com/__estore/partners_programs/__programs/developer/__TilingArrayTools/index.affx
>     <http://www.affymetrix.com/estore/partners_programs/programs/developer/TilingArrayTools/index.affx>
>
>     Best,
>
>     Jim
>
>
>
>
>     On 6/2/2014 2:55 PM, Danny Arends wrote:
>
>         Hey,
>
>         I got a bug trying to create a custom cdf environment, which I
>         need to
>         analyse some affy arrays:
>
>         Both functions give the same error:
>
>           >
>         make.cdf.package("GPL16303___TilingatSNPtilx520433_At___TAIRG.cdf",
>         species="Arabidopsis_Thaliana"__)
>         Reading CDF file.
>         Creating CDF environment
>         Wait for about 0 dots
>         Error in assign(x[i], value[[i2]], envir = envir, inherits =
>         inherits) :
>             invalid first argument
>
>           > env <-
>         make.cdf.env("GPL16303___TilingatSNPtilx520433_At___TAIRG.cdf")
>         Reading CDF file.
>         Creating CDF environment
>         Wait for about 0 dots
>         Error in assign(x[i], value[[i2]], envir = envir, inherits =
>         inherits) :
>             invalid first argument
>
>         My R version:
>           > version
>         platform       x86_64-pc-linux-gnu
>         arch           x86_64
>         os             linux-gnu
>         system         x86_64, linux-gnu
>         status
>         major          3
>         minor          1.0
>         year           2014
>         month          04
>         day            10
>         svn rev        65387
>         language       R
>         version.string R version 3.1.0 (2014-04-10)
>         nickname       Spring Dance
>
>         Is there any fix for this???, because I really wanna look into
>         my array
>         data...
>
>         Gr,
>         Danny Arends
>
>
>     --
>     James W. MacDonald, M.S.
>     Biostatistician
>     University of Washington
>     Environmental and Occupational Health Sciences
>     4225 Roosevelt Way NE, # 100
>     Seattle WA 98105-6099
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list