[BioC] Subsetting in xps

cstrato cstrato at aon.at
Fri Nov 6 17:02:48 CET 2009


Dear Michael,

In the following I will try to describe some options which may be 
helpful in cases where you cannot load all feature data due to memory 
limitations:

1, The option to load only a subset of feature data using:
 > data <- attachMask(data.xps)
 > data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3]))
allows you to do some QC such as e.g. boxplot(), hist() and pmplot().
However, when running RMA all data will be used since running rma is 
independent of the imported data.

2, When you have imported all CEL files into a root data file you can 
create a subset as follows (see ?root.data):
# load only a subset from a ROOT data file
 > rootfile <- paste(getwd(), paste(project, "_cel.root", sep=""), sep="/")
 > subdata.xps <- root.data(scheme.HuGene10, rootfile=rootfile, 
celnames=c("Name2.cel","Name7.cel","Name9.cel"))
Now you can run RMA using this subset only:
 > data.rma <- rma(subdata.xps,...)

3, You can also do some QC w/o the need to import the feature data:
# density plot:
 > root.density(data.xps)

# you can also save the density plot for each chip using
 > for (tree in treeNames(data.xps)) {
 >    root.density(data.xps, treename=tree, canvasname=tree, save.as="png")
 > }

# image plot:
 > root.image(data.xps, treename="MyName.cel")
# save image automatically
 > root.image(data.xps, treename="MyName.cel", logbase="log2", 
canvasname="Image_MyName_log2", save.as="png")

# profile plot (similar to boxplot)
 > root.profile(data.xps)
However, the profile plots may also have some memory limitations. In 
this case you can create profile plots from subsets only by using 
parameter "treename".

Maybe one more note: I am not sure if you really want to use long 
filenames such as "M9R_001c01_1_(HuGene-1_0-st-v1).CEL". Function 
"import.data" has parameter "celnames" which you could use for 
alternative filenames such as e.g. celnames=c("M9R_001c01_1",...). You 
can still have access to the original filenames using:
 > filenames <- rawCELName(data.xps)

You can also find many code examples for whole genome and exon arrays in 
the files "script4xps.R" and "script4exon.R" located in the package 
directory "xps/examples".

Please let me know if this information could answer your questions.

Best regards
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a           A.u.s.t.r.i.a
e.m.a.i.l:        cstrato at aon.at
_._._._._._._._._._._._._._._._._._


Michael Walter wrote:
> Dear all,
>
> I'm using xps to read Affy gene array cel files. If the number of arrays is exceeding 15 arrays I can no longer look at the feature data due to memory limitations. So I'd like to generate a root tree for the entire study and then look at the values for a subset for QC. I can generate a data tree set with the intensities of only a fraction by specifing the sample names (please see code and sessionInfo below). However, when I subsequently run RMA on the newly generated tree the resulting data frame contains all samples from the initial root tree. The code here is a small example with 4 arrays for demo purposes. If anyone has a suggestion, how to obtain the normalized signales from my subset, I would be very happy.
>
> Kind regards,
>
> Michael
>
>
>
> Welcome to Bioconductor
>
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>   
>> library(RColorBrewer)
>> library(xps)
>>     
>
> Welcome to xps version 1.4.6 
>     an R wrapper for XPS - eXpression Profiling System
>     (c) Copyright 2001-2009 by Christian Stratowa
>     
>
> Attache Paket: 'xps'
>
>
>         The following object(s) are masked from package:Biobase :
>
>          exprs,
>          exprs<-,
>          se.exprs 
>
>   
>> project = "M9R_001"
>>     
>
>   
>> celfile = getwd()
>>     
>
>   
>> filenames = list.files(path=celfile)
>>     
>
>   
>> filenames = filenames[grep(".CEL", filenames)]
>>     
>
>   
>> filenames
>>     
>
> [1] "M9R_001c01_1_(HuGene-1_0-st-v1).CEL" "M9R_001c02_1_(HuGene-1_0-st-v1).CEL"
>
> [3] "M9R_001c03_1_(HuGene-1_0-st-v1).CEL" "M9R_001c04_1_(HuGene-1_0-st-v1).CEL"
>
>   
>> scheme.HuGene10 <- root.scheme(paste("X:/affy/QC_Scripts/xps/schemes","Scheme_HuGene10stv1r4_na27_2.root",sep="/"))
>>     
>
>   
>> data.xps <- root.data(scheme.HuGene10, 
>>     
> + paste(getwd(), paste(project, "_cel.root", sep=""), sep="/"))
>
>   
>> fname.tree = data.xps at treenames
>>     
>
>   
>> data <- attachMask(data.xps)
>>     
>
>   
>> data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3]))
>>     
>
>   
>> head(data at data)
>>     
>   X Y M9R_001c01_1_(HuGene-1_0-st-v1).cel_MEAN
> 1 0 0                                     6745
> 2 1 0                                      124
> 3 2 0                                     6719
> 4 3 0                                       90
> 5 4 0                                       61
> 6 5 0                                       89
>
>   M9R_001c02_1_(HuGene-1_0-st-v1).cel_MEAN
> 1                                     8246
> 2                                      127
> 3                                     8231
> 4                                      122
> 5                                       61
> 6                                      127
>
>   M9R_001c03_1_(HuGene-1_0-st-v1).cel_MEAN
> 1                                     6190
> 2                                      112
> 3                                     5958
> 4                                       72
> 5                                       68
> 6                                      116
>
>   
>> dim(data at data)
>>     
>
> [1] 1102500       5
>
>   
>> data.rma <- rma(data, "tmpdt_dataRMA", background="antigenomic", normalize=T, 
>>     
> + exonlevel=c(402492,402492,402492), verbose = FALSE)
>
>   
>> expr.rma <- validData(data.rma)
>>     
>
>   
>> dim(expr.rma)
>>     
>
> [1] 33025     4
>
>   
>> sessionInfo()
>>     
>
> R version 2.9.0 (2009-04-17) 
> i386-pc-mingw32 
>
> locale:
> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
>
> other attached packages:
> [1] xps_1.4.6          RColorBrewer_1.0-2 Biobase_2.4.1      ROC_1.18.0        
>   
>
>
>



More information about the Bioconductor mailing list