[BioC] Subsetting in xps

Michael Walter michael.walter at med.uni-tuebingen.de
Mon Nov 9 13:28:36 CET 2009


Dear Christian,

That was a great deal of help. I have not tried all tipps, but I surely will. Thank you very much,

Michael

> -----Ursprüngliche Nachricht-----
> Von: "cstrato" <cstrato at aon.at>
> Gesendet: 06.11.09 19:42:14
> An: Michael Walter <michael.walter at med.uni-tuebingen.de>
> CC: bioconductor at stat.math.ethz.ch
> Betreff: Re: [BioC] Subsetting in xps


> Dear Michael,
> 
> In the following I will try to describe some options which may be 
> helpful in cases where you cannot load all feature data due to memory 
> limitations:
> 
> 1, The option to load only a subset of feature data using:
>  > data <- attachMask(data.xps)
>  > data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3]))
> allows you to do some QC such as e.g. boxplot(), hist() and pmplot().
> However, when running RMA all data will be used since running rma is 
> independent of the imported data.
> 
> 2, When you have imported all CEL files into a root data file you can 
> create a subset as follows (see ?root.data):
> # load only a subset from a ROOT data file
>  > rootfile <- paste(getwd(), paste(project, "_cel.root", sep=""), sep="/")
>  > subdata.xps <- root.data(scheme.HuGene10, rootfile=rootfile, 
> celnames=c("Name2.cel","Name7.cel","Name9.cel"))
> Now you can run RMA using this subset only:
>  > data.rma <- rma(subdata.xps,...)
> 
> 3, You can also do some QC w/o the need to import the feature data:
> # density plot:
>  > root.density(data.xps)
> 
> # you can also save the density plot for each chip using
>  > for (tree in treeNames(data.xps)) {
>  >    root.density(data.xps, treename=tree, canvasname=tree, save.as="png")
>  > }
> 
> # image plot:
>  > root.image(data.xps, treename="MyName.cel")
> # save image automatically
>  > root.image(data.xps, treename="MyName.cel", logbase="log2", 
> canvasname="Image_MyName_log2", save.as="png")
> 
> # profile plot (similar to boxplot)
>  > root.profile(data.xps)
> However, the profile plots may also have some memory limitations. In 
> this case you can create profile plots from subsets only by using 
> parameter "treename".
> 
> Maybe one more note: I am not sure if you really want to use long 
> filenames such as "M9R_001c01_1_(HuGene-1_0-st-v1).CEL". Function 
> "import.data" has parameter "celnames" which you could use for 
> alternative filenames such as e.g. celnames=c("M9R_001c01_1",...). You 
> can still have access to the original filenames using:
>  > filenames <- rawCELName(data.xps)
> 
> You can also find many code examples for whole genome and exon arrays in 
> the files "script4xps.R" and "script4exon.R" located in the package 
> directory "xps/examples".
> 
> Please let me know if this information could answer your questions.
> 
> Best regards
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
> V.i.e.n.n.a           A.u.s.t.r.i.a
> e.m.a.i.l:        cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
> 
> 
> Michael Walter wrote:
> > Dear all,
> >
> > I'm using xps to read Affy gene array cel files. If the number of arrays is exceeding 15 arrays I can no longer look at the feature data due to memory limitations. So I'd like to generate a root tree for the entire study and then look at the values for a subset for QC. I can generate a data tree set with the intensities of only a fraction by specifing the sample names (please see code and sessionInfo below). However, when I subsequently run RMA on the newly generated tree the resulting data frame contains all samples from the initial root tree. The code here is a small example with 4 arrays for demo purposes. If anyone has a suggestion, how to obtain the normalized signales from my subset, I would be very happy.
> >
> > Kind regards,
> >
> > Michael
> >
> >
> >
> > Welcome to Bioconductor
> >
> >   Vignettes contain introductory material. To view, type
> >   'openVignette()'. To cite Bioconductor, see
> >   'citation("Biobase")' and for packages 'citation(pkgname)'.
> >
> >   
> >> library(RColorBrewer)
> >> library(xps)
> >>     
> >
> > Welcome to xps version 1.4.6 
> >     an R wrapper for XPS - eXpression Profiling System
> >     (c) Copyright 2001-2009 by Christian Stratowa
> >     
> >
> > Attache Paket: 'xps'
> >
> >
> >         The following object(s) are masked from package:Biobase :
> >
> >          exprs,
> >          exprs<-,
> >          se.exprs 
> >
> >   
> >> project = "M9R_001"
> >>     
> >
> >   
> >> celfile = getwd()
> >>     
> >
> >   
> >> filenames = list.files(path=celfile)
> >>     
> >
> >   
> >> filenames = filenames[grep(".CEL", filenames)]
> >>     
> >
> >   
> >> filenames
> >>     
> >
> > [1] "M9R_001c01_1_(HuGene-1_0-st-v1).CEL" "M9R_001c02_1_(HuGene-1_0-st-v1).CEL"
> >
> > [3] "M9R_001c03_1_(HuGene-1_0-st-v1).CEL" "M9R_001c04_1_(HuGene-1_0-st-v1).CEL"
> >
> >   
> >> scheme.HuGene10 <- root.scheme(paste("X:/affy/QC_Scripts/xps/schemes","Scheme_HuGene10stv1r4_na27_2.root",sep="/"))
> >>     
> >
> >   
> >> data.xps <- root.data(scheme.HuGene10, 
> >>     
> > + paste(getwd(), paste(project, "_cel.root", sep=""), sep="/"))
> >
> >   
> >> fname.tree = data.xps at treenames
> >>     
> >
> >   
> >> data <- attachMask(data.xps)
> >>     
> >
> >   
> >> data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3]))
> >>     
> >
> >   
> >> head(data at data)
> >>     
> >   X Y M9R_001c01_1_(HuGene-1_0-st-v1).cel_MEAN
> > 1 0 0                                     6745
> > 2 1 0                                      124
> > 3 2 0                                     6719
> > 4 3 0                                       90
> > 5 4 0                                       61
> > 6 5 0                                       89
> >
> >   M9R_001c02_1_(HuGene-1_0-st-v1).cel_MEAN
> > 1                                     8246
> > 2                                      127
> > 3                                     8231
> > 4                                      122
> > 5                                       61
> > 6                                      127
> >
> >   M9R_001c03_1_(HuGene-1_0-st-v1).cel_MEAN
> > 1                                     6190
> > 2                                      112
> > 3                                     5958
> > 4                                       72
> > 5                                       68
> > 6                                      116
> >
> >   
> >> dim(data at data)
> >>     
> >
> > [1] 1102500       5
> >
> >   
> >> data.rma <- rma(data, "tmpdt_dataRMA", background="antigenomic", normalize=T, 
> >>     
> > + exonlevel=c(402492,402492,402492), verbose = FALSE)
> >
> >   
> >> expr.rma <- validData(data.rma)
> >>     
> >
> >   
> >> dim(expr.rma)
> >>     
> >
> > [1] 33025     4
> >
> >   
> >> sessionInfo()
> >>     
> >
> > R version 2.9.0 (2009-04-17) 
> > i386-pc-mingw32 
> >
> > locale:
> > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base     
> >
> > other attached packages:
> > [1] xps_1.4.6          RColorBrewer_1.0-2 Biobase_2.4.1      ROC_1.18.0        
> >   
> >
> >
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
MFT Services
University of Tuebingen
Calwerstr. 7
72076  Tübingen/GERMANY

Tel.: +49 (0) 7071 29 83210
Fax. + 49 (0) 7071 29 5228

Confidentiality Note:\ This message is intended only for...{{dropped:9}}



More information about the Bioconductor mailing list