[BioC] Cannot read in CEL files with XPS

cstrato cstrato at aon.at
Sun Dec 7 20:53:29 CET 2008


Dear Chris,

Maybe the following information can help you solve your problems:

This is my setup:
A dual-boot MacBook Pro, 2GB RAM, running Windows XP SP2 where I have 
installed the following binary versions:
- R-2.8.0-win32.exe
- root_v5.18.00.win32.vc80.msi
- xps_1.2.1.zip
Note that root_v5.18.00 is necessary since Bioconductor has compiled xps 
with this version.

You can run xps either from RGui or from Rterm:
When using RGui you should set "verbose=FALSE" in all functions, since 
you will not see any messages anyhow. I would recommend using Rterm with 
"verbose=TRUE", at least initially to get a feeling what xps does, see 
the examples below.


1. Import schemes:
Since xps uses the original Affymetrix CDF, PGF and annotation files, 
you have to import these files first. Here is my Rterm session for doing 
this for HG-U133_Plus_2:
 > library(xps)

Welcome to xps version 1.2.1
    an R wrapper for XPS - eXpression Profiling System
    (c) Copyright 2001-2008 by Christian Stratowa

 > libdir <- "C:/home/Affy/libraryfiles"
 > anndir <- "C:/home/Affy/Annotation"
 > scmdir <- "C:/home/Rabbitus/CRAN/Workspaces/Schemes"
 > scheme.hgu133p2.na27 <- 
import.expr.scheme("Scheme_HGU133p2_na27",filedir=scmdir,paste(libdir,"HG-U133_Plus_2.cdf",sep="/"),paste(libdir,"HG-U133-PLUS_probe.tab",sep="/"),paste(anndir,"Version08Nov/HG-U133_Plus_2.na27.annot.csv",sep="/"))
Creating new file 
<C:/home/Rabbitus/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root>...
Importing <C:/home/Affy/libraryfiles/HG-U133_Plus_2.cdf> as 
<HG-U133_Plus_2.scm>...
   <1354896> records imported...Finished
   PM/MM statistics:
      5 cells with minimum number of PM/MM pairs: 8
      1 cells with maximum number of PM/MM pairs: 69
New dataset <HG-U133_Plus_2> is added to Content...
Importing <C:/home/Affy/libraryfiles/HG-U133-PLUS_probe.tab> as 
<HG-U133_Plus_2.prb>...
Warning: The following header columns are missing:
<Serial Order>
   <604258> records read...Finished
   <1354896> records imported...Finished
   probe info:
      GC content: minimum GC is <3>  maximum GC is <22>
      Melting Tm: minimum Tm is <51>  maximum Tm is <89>
Importing 
<C:/home/Affy/Annotation/Version08Nov/HG-U133_Plus_2.na27.annot.csv> as 
<HG-U133_Plus_2.ann>...
Warning: The following header columns are missing:
<Protein Families>
<Protein Domains>
   Number of annotated transcripts is <54675>.
Warning: Number of transcripts with ambigous annotation is <336>
   <54675> records imported...Finished
 >

I would recommend to import all necessary schemes and save them in a 
common system directory. You need not save this R session since you can 
access every scheme in later R sessions with function root.scheme().
Note that with xps_1.2.1 it is no longer necessary to delete the first 
12 lines from the annotation file. All warnings can be ignored, they are 
caused by changes in the Affymetrix annotation files.


2. Import CEL-files:
To show you that xps can easily handle many CEL-files I have imported 
all 53 CEl-files from the Affymetrix human tissue/mix dataset.

Here is the output for RGui:
 > library(xps)

Welcome to xps version 1.2.1
    an R wrapper for XPS - eXpression Profiling System
    (c) Copyright 2001-2008 by Christian Stratowa
   
 > scmdir <- "E:/CRAN/Workspaces/Schemes"
 > scmdir <- "E:/CRAN/Workspaces/Schemes"
 > celdir <- "E:/ChipData/Exon/HuMixture"
 > datdir <- "E:/CRAN/Workspaces/ROOTData"
 > scheme.u133p2 <- 
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
 > Sys.time()
[1] "2008-12-07 14:47:20 CET"
 > data.mix <- import.data(scheme.u133p2, "HuMixAllU133P2", 
filedir=datdir, celdir=celdir, verbose=FALSE)
 > Sys.time()
[1] "2008-12-07 14:53:45 CET"
 >
As you see, importing 53 CEL-files takes about 7 min.

Here is the (partial) output when using Rterm:
 > library(xps)

Welcome to xps version 1.2.1
    an R wrapper for XPS - eXpression Profiling System
    (c) Copyright 2001-2008 by Christian Stratowa

 > scmdir <- "E:/CRAN/Workspaces/Schemes"
 > celdir <- "E:/ChipData/Exon/HuMixture"
 > datdir <- "E:/CRAN/Workspaces/ROOTData"
 > scheme.u133p2 <- 
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
 > data.mix <- import.data(scheme.u133p2, "HuMixAllU133P2", 
filedir=datdir, celdir=celdir, verbose=TRUE)
Opening file <E:/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root> in 
<READ> mode...
Creating new file <E:/CRAN/Workspaces/ROOTData/HuTissuesU133P2_cel.root>...
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_breast_A.CEL> as 
<u1332plus_ivt_breast_A.cel>...
   <1354896> records imported...
   hybridization statistics:
      4 cells with minimal intensity 32
      1 cells with maximal intensity 16261
New dataset <DataSet> is added to Content...
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_breast_B.CEL> as 
<u1332plus_ivt_breast_B.cel>...
   <1354896> records imported...
   hybridization statistics:
      1 cells with minimal intensity 24
      1 cells with maximal intensity 20496
...
...
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_thyroid_B.CEL> as 
<u1332plus_ivt_thyroid_B.cel>...
   <1354896> records imported...
   hybridization statistics:
      1 cells with minimal intensity 29
      1 cells with maximal intensity 47017
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_thyroid_C.CEL> as 
<u1332plus_ivt_thyroid_C.cel>...
   <1354896> records imported...
   hybridization statistics:
      1 cells with minimal intensity 24
      2 cells with maximal intensity 65534
 >

As you see, in Rterm you see the progress status and get some 
statistical information. Since CEL-files have often long and strange 
names I would recommend to use parameter "celnames" in function 
import.data() to use new names. Once again you need not save the R 
session since you can access the data in later R sessions using function 
root.data().


3. RMA normalization:
RMA normalization of all 53 CEL-files takes about 1 hr.

Here is the RGui session:
 > library(xps)

Welcome to xps version 1.2.1
    an R wrapper for XPS - eXpression Profiling System
    (c) Copyright 2001-2008 by Christian Stratowa
   
 > scmdir <- "E:/CRAN/Workspaces/Schemes"
 > scheme.u133p2 <- 
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
 > datdir <- "E:/CRAN/Workspaces/ROOTData"
 > data.u133p2 <- root.data(scheme.u133p2, 
paste(datdir,"HuMixAllU133P2_cel.root",sep="/"))
 > Sys.time()
[1] "2008-12-07 14:59:12 CET"
 > data.rma <- 
rma(data.u133p2,"MixAllU133P2RMA",tmpdir="",background="pmonly",normalize=TRUE,verbose=FALSE)
 > Sys.time()
[1] "2008-12-07 15:55:25 CET"
 >

In comparison, here is the (partial) Rterm session:
 > library(xps)

Welcome to xps version 1.2.1
    an R wrapper for XPS - eXpression Profiling System
    (c) Copyright 2001-2008 by Christian Stratowa

 > scmdir <- "E:/CRAN/Workspaces/Schemes"
 > scheme.u133p2 <- 
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
 > datdir <- "E:/CRAN/Workspaces/ROOTData"
 > data.u133p2 <- root.data(scheme.u133p2, 
paste(datdir,"HuMixAllU133P2_cel.root",sep="/"))
 > Sys.time()
[1] "2008-12-07 13:32:35 CET"
 > data.rma <- 
rma(data.u133p2,"MixAllU133P2RMA",tmpdir="",background="pmonly",normalize=TRUE,verbose=TRUE)
Creating new file 
<E:/CRAN/Workspaces/Exon/hutissues/u133p2/MixAllU133P2RMA.root>...
Opening file <E:/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root> in 
<READ> mode...
Opening file <E:/CRAN/Workspaces/ROOTData/HuMixAllU133P2_cel.root> in 
<READ> mode...
Preprocessing data using method <preprocess>...
   Background correcting raw data...
      calculating background for <u1332plus_ivt_breast_A.cel>...
      background statistics:
         750638 cells with minimal intensity 0
         1468 cells with maximal intensity 69.3196
      calculating background for <u1332plus_ivt_breast_B.cel>...
      background statistics:
         750638 cells with minimal intensity 0
         1334 cells with maximal intensity 68.3009
...
...
      calculating background for <u1332plus_ivt_thyroid_B.cel>...
      background statistics:
         750638 cells with minimal intensity 0
         295 cells with maximal intensity 65.6557
      calculating background for <u1332plus_ivt_thyroid_C.cel>...
      background statistics:
         750638 cells with minimal intensity 0
         1 cells with maximal intensity 74.3142
   Normalizing raw data...
      normalizing data using method <quantile>...
         finished filling <53> arrays.           ..
         finished filling <53> trees.          cqu>...
   Converting raw data to expression levels...
      summarizing with <medianpolish>...
      calculating expression for <54675> of <54684> units...Finished.
      expression statistics:
         minimal expression level is <2.65147>
         maximal expression level is <15470.9>
   preprocessing finished.
Opening file <E:/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root> in 
<READ> mode...
Opening file 
<E:/CRAN/Workspaces/Exon/hutissues/u133p2/MixAllU133P2RMA.root> in 
<READ> mode...
Exporting data from tree <*> to file 
<E:/CRAN/Workspaces/Exon/hutissues/u133p2/MixAllU133P2RMA.txt>...
Reading entries from <HG-U133_Plus_2.ann> ...Finished
<54675> of <54675> records exported.
 > Sys.time()
[1] "2008-12-07 14:35:09 CET"
 >

Once again, in Rterm you see the progress status and get some 
statistical information. I consider it helpful to see the progress 
information, especially when computation takes a long time.

I hope that this demonstration could show you how to use xps 
successfully, and can help you solving your problems.

Best regards
Christian



cstrato wrote:
> Dear Chris
>
> This is strange, could you please give your sessionInfo(), which 
> version of xps,  which version of ROOT, which version of R, WinXP or 
> Vista?
>
> Could you please give the complete code for creating the scheme.
> I am not sure if it is a good idea to save the "hgu133plu2.root" file 
> in the package directory, I would propose to create a directory 
> "schemes" somewhere else, e.g. "McMasters/schemes".
>
> Furthermore, could you please set "verbose=TRUE" in the methods and 
> start R from the Command Console. Then you will see the progress 
> messages. Could you please send me this output, so that I can check 
> the result?
>
> Handling 40 CEL-files should not be a problem, one user of xps 
> reported that he could successfully handle 500 CEL-files on his 
> Windows machine.
>
> Best regards
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
> V.i.e.n.n.a           A.u.s.t.r.i.a
> e.m.a.i.l:        cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
>
>
>
> Christopher N Barnes wrote:
>> All,
>>  
>> I am new to xps and am having trouble reading in the cel files.
>>  
>> I got the 3 correct files from affymetrix and created a scheme 
>> removing the first 12 lines from the annotation  file (fix 1)
>>  
>>  
>> I then read in my scheme:
>> hgu133plus2<-root.scheme(paste(.path.package("xps"),"schemes/hgu133plus2.root", 
>>
>>    sep="/"))
>>  
>> and then try to read in the CEL files.
>> celdir2<-"C:/McMasters/test"
>> data.test3<-import.data(hgu133plus2,"tmp2",celdir=celdir2, 
>> verbose=FALSE)
>> It worked 1 time and now causes R to crash.  I am trying to read in 
>> 40 CEL files 50,000+ genes on a 4G machine.
>>  
>> Does anyone have any suggestions of another method to read a large 
>> amount of CEL files.  If I try using Read Affy()  to read in, I don't 
>> have the space to allocate.   
>> Thanks for the  Help,
>>  
>> Chris Barnes
>> PhD student University of Louisville
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list