[BioC] Subsetting in xps
Michael Walter
michael.walter at med.uni-tuebingen.de
Mon Nov 9 13:28:36 CET 2009
Dear Christian,
That was a great deal of help. I have not tried all tipps, but I surely will. Thank you very much,
Michael
> -----Ursprüngliche Nachricht-----
> Von: "cstrato" <cstrato at aon.at>
> Gesendet: 06.11.09 19:42:14
> An: Michael Walter <michael.walter at med.uni-tuebingen.de>
> CC: bioconductor at stat.math.ethz.ch
> Betreff: Re: [BioC] Subsetting in xps
> Dear Michael,
>
> In the following I will try to describe some options which may be
> helpful in cases where you cannot load all feature data due to memory
> limitations:
>
> 1, The option to load only a subset of feature data using:
> > data <- attachMask(data.xps)
> > data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3]))
> allows you to do some QC such as e.g. boxplot(), hist() and pmplot().
> However, when running RMA all data will be used since running rma is
> independent of the imported data.
>
> 2, When you have imported all CEL files into a root data file you can
> create a subset as follows (see ?root.data):
> # load only a subset from a ROOT data file
> > rootfile <- paste(getwd(), paste(project, "_cel.root", sep=""), sep="/")
> > subdata.xps <- root.data(scheme.HuGene10, rootfile=rootfile,
> celnames=c("Name2.cel","Name7.cel","Name9.cel"))
> Now you can run RMA using this subset only:
> > data.rma <- rma(subdata.xps,...)
>
> 3, You can also do some QC w/o the need to import the feature data:
> # density plot:
> > root.density(data.xps)
>
> # you can also save the density plot for each chip using
> > for (tree in treeNames(data.xps)) {
> > root.density(data.xps, treename=tree, canvasname=tree, save.as="png")
> > }
>
> # image plot:
> > root.image(data.xps, treename="MyName.cel")
> # save image automatically
> > root.image(data.xps, treename="MyName.cel", logbase="log2",
> canvasname="Image_MyName_log2", save.as="png")
>
> # profile plot (similar to boxplot)
> > root.profile(data.xps)
> However, the profile plots may also have some memory limitations. In
> this case you can create profile plots from subsets only by using
> parameter "treename".
>
> Maybe one more note: I am not sure if you really want to use long
> filenames such as "M9R_001c01_1_(HuGene-1_0-st-v1).CEL". Function
> "import.data" has parameter "celnames" which you could use for
> alternative filenames such as e.g. celnames=c("M9R_001c01_1",...). You
> can still have access to the original filenames using:
> > filenames <- rawCELName(data.xps)
>
> You can also find many code examples for whole genome and exon arrays in
> the files "script4xps.R" and "script4exon.R" located in the package
> directory "xps/examples".
>
> Please let me know if this information could answer your questions.
>
> Best regards
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
> V.i.e.n.n.a A.u.s.t.r.i.a
> e.m.a.i.l: cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
>
>
> Michael Walter wrote:
> > Dear all,
> >
> > I'm using xps to read Affy gene array cel files. If the number of arrays is exceeding 15 arrays I can no longer look at the feature data due to memory limitations. So I'd like to generate a root tree for the entire study and then look at the values for a subset for QC. I can generate a data tree set with the intensities of only a fraction by specifing the sample names (please see code and sessionInfo below). However, when I subsequently run RMA on the newly generated tree the resulting data frame contains all samples from the initial root tree. The code here is a small example with 4 arrays for demo purposes. If anyone has a suggestion, how to obtain the normalized signales from my subset, I would be very happy.
> >
> > Kind regards,
> >
> > Michael
> >
> >
> >
> > Welcome to Bioconductor
> >
> > Vignettes contain introductory material. To view, type
> > 'openVignette()'. To cite Bioconductor, see
> > 'citation("Biobase")' and for packages 'citation(pkgname)'.
> >
> >
> >> library(RColorBrewer)
> >> library(xps)
> >>
> >
> > Welcome to xps version 1.4.6
> > an R wrapper for XPS - eXpression Profiling System
> > (c) Copyright 2001-2009 by Christian Stratowa
> >
> >
> > Attache Paket: 'xps'
> >
> >
> > The following object(s) are masked from package:Biobase :
> >
> > exprs,
> > exprs<-,
> > se.exprs
> >
> >
> >> project = "M9R_001"
> >>
> >
> >
> >> celfile = getwd()
> >>
> >
> >
> >> filenames = list.files(path=celfile)
> >>
> >
> >
> >> filenames = filenames[grep(".CEL", filenames)]
> >>
> >
> >
> >> filenames
> >>
> >
> > [1] "M9R_001c01_1_(HuGene-1_0-st-v1).CEL" "M9R_001c02_1_(HuGene-1_0-st-v1).CEL"
> >
> > [3] "M9R_001c03_1_(HuGene-1_0-st-v1).CEL" "M9R_001c04_1_(HuGene-1_0-st-v1).CEL"
> >
> >
> >> scheme.HuGene10 <- root.scheme(paste("X:/affy/QC_Scripts/xps/schemes","Scheme_HuGene10stv1r4_na27_2.root",sep="/"))
> >>
> >
> >
> >> data.xps <- root.data(scheme.HuGene10,
> >>
> > + paste(getwd(), paste(project, "_cel.root", sep=""), sep="/"))
> >
> >
> >> fname.tree = data.xps at treenames
> >>
> >
> >
> >> data <- attachMask(data.xps)
> >>
> >
> >
> >> data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3]))
> >>
> >
> >
> >> head(data at data)
> >>
> > X Y M9R_001c01_1_(HuGene-1_0-st-v1).cel_MEAN
> > 1 0 0 6745
> > 2 1 0 124
> > 3 2 0 6719
> > 4 3 0 90
> > 5 4 0 61
> > 6 5 0 89
> >
> > M9R_001c02_1_(HuGene-1_0-st-v1).cel_MEAN
> > 1 8246
> > 2 127
> > 3 8231
> > 4 122
> > 5 61
> > 6 127
> >
> > M9R_001c03_1_(HuGene-1_0-st-v1).cel_MEAN
> > 1 6190
> > 2 112
> > 3 5958
> > 4 72
> > 5 68
> > 6 116
> >
> >
> >> dim(data at data)
> >>
> >
> > [1] 1102500 5
> >
> >
> >> data.rma <- rma(data, "tmpdt_dataRMA", background="antigenomic", normalize=T,
> >>
> > + exonlevel=c(402492,402492,402492), verbose = FALSE)
> >
> >
> >> expr.rma <- validData(data.rma)
> >>
> >
> >
> >> dim(expr.rma)
> >>
> >
> > [1] 33025 4
> >
> >
> >> sessionInfo()
> >>
> >
> > R version 2.9.0 (2009-04-17)
> > i386-pc-mingw32
> >
> > locale:
> > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base
> >
> > other attached packages:
> > [1] xps_1.4.6 RColorBrewer_1.0-2 Biobase_2.4.1 ROC_1.18.0
> >
> >
> >
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
MFT Services
University of Tuebingen
Calwerstr. 7
72076 Tübingen/GERMANY
Tel.: +49 (0) 7071 29 83210
Fax. + 49 (0) 7071 29 5228
Confidentiality Note:\ This message is intended only for...{{dropped:9}}
More information about the Bioconductor
mailing list