[BioC] ExpressionSet subsetting problem

Martin Morgan mtmorgan at fhcrc.org
Fri Apr 11 18:18:15 CEST 2008


Hi IAIN --

IAIN GALLAGHER <iaingallagher at btopenworld.com> writes:

> Hi Everyone.
>
> I'm having a problem subsetting an ExpressionSet. After reading my
> cel files in and summarizing with MAS5 I assign a new
> AnnotatedDataFrame to describe the data. This is a tab delimited
> text file in the following format:
[snip]        
> pheno <- read.AnnotatedDataFrame('covdesc.txt', sep='\t')
> phenoData(mas_data) <- pheno

Probably the problem is here, where your new AnnotatedDataFrame has
samples ordered differently from mas_data. Try
validObject(mas_data). Here's a reproducible example

> data(sample.ExpressionSet)
> obj <- sample.ExpressionSet
> pd <- phenoData(obj)
> newPd <- pd[sample(sampleNames(pd)),]
> phenoData(obj) <- newPd
> validObject(obj)
Error in validObject(obj) : 
  invalid class "ExpressionSet" object: sampleNames differ between assayData and phenoData

If I were to have newPd, and wanted to make sure the assignment were
correct, I might

> data(sample.ExpressionSet)
> obj <- sample.ExpressionSet
> phenoData(obj) <- newPd[sampleNames(obj),]
> validObject(obj)

The reason for this dangerous behavior traces back to the need to
sometimes create transiently invalid objects in the process of
transforming from one ExpressionSet to another.

Martin

> This seems to go well.
>
> I now create an index to pull out only those subjects with 'Pancreas' under 'Site'.
>
> panc_index <- which(phenoData(mas_data)$Site == 'Pancreas')
>
> This returns a vector of numbers
>
> 1  3  4 15 23 28 29
>
> Now I subset my data with this
>
> kept_data <- mas_data[,panc_index]
>
> This is where I'm running into problems
>
>> head(exprs(panc_pts))
>             F100.CEL   F105.CEL   F106.CEL   F45.CEL    F57.CEL    F97.CEL
> 1007_s_at 1853.75910 2834.19034 1865.65600 869.44930 1307.60507 2006.37103
> 1053_at    811.05343  517.32617  519.08446 490.94832  582.09189  544.34508
> 117_at      78.34070   26.91147   93.21263 129.14469  241.32762   31.05214
> 121_at     419.79056  494.92934  685.06496 478.36533  661.30741  591.22300
> 1255_g_at   84.53744   18.25635   76.71271  44.79287   69.42122   99.33932
> 1294_at    329.38568  447.23030  529.64516 369.30509  487.00975  339.38840
>              F99.CEL
> 1007_s_at 1168.56112
> 1053_at    425.16363
> 117_at      18.87988
> 121_at     511.47964
> 1255_g_at   54.36606
> 1294_at    372.36992
>
> looks ok but whilst subjects 1,3 & 4 are pulled out appropriately (F100, F105 and F106 respectively) the next two subjects are not. F45 is sample number 14 not 15 and F57 is sample number 22 not 23. The last two samples (F97 and F99) are pulled out properly.
>
> Could anyone explain why this is? I'd be most grateful.
>
> Thanks
>
> iain
>
>> sessionInfo()
> R version 2.6.2 (2008-02-08) 
> i486-pc-linux-gnu 
>
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils     datasets 
> [8] methods   base     
>
> other attached packages:
>  [1] simpleaffy_2.14.05   gcrma_2.10.0         matchprobes_1.10.0  
>  [4] genefilter_1.16.0    survival_2.34        hgu133plus2cdf_2.0.0
>  [7] affy_1.16.0          preprocessCore_1.0.0 affyio_1.6.1        
> [10] Biobase_1.16.2      
>
> loaded via a namespace (and not attached):
> [1] annotate_1.16.1     AnnotationDbi_1.0.6 DBI_0.2-4          
> [4] rcompgen_0.1-17     RSQLite_0.6-7  
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list