[BioC] Subsetting expression sets for mass spec data - second ask

Tue Nov 27 23:29:15 CET 2012

On 11/27/2012 09:32 AM, Tim Triche, Jr. wrote:
> would it be the worst thing in the world to add rownames() and colnames()
> support for eSet-derived objects in Biobase?
>
> setMethod("rownames",
>            signature=signature(x="eSet"),
>            function(x) featureNames(x))
>
> setMethod("colnames",
>            signature=signature(x="eSet"),
>            function(x) sampleNames(x))

getters and setters, and for dimnames are now in devel 2.19.1. Martin

> SummarizedExperiments already have these sort of "do what I mean"
> semantics... one reason I like using them.
>
>
>
> On Tue, Nov 27, 2012 at 3:08 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>> On Mon, Nov 26, 2012 at 10:55 PM, McGee, Monnie <mmcgee at mail.smu.edu>
>> wrote:
>>
>>>
>>> Dear BioC Users,
>>>
>>> I would like to be able to subset a mass spectrometry data set by the
>>> biomarkers that were chosen as
>>> important biomarkers. I followed the code in the PROcess vignette to
>>> obtain the biomarkers as follows:
>>>
>>> testNorm is a normalized matrix of m/z values from 253 samples
>>>> bmkfile <- paste(getwd(), "testbiomarker.csv", sep = "/")
>>>> testBio = pk2bmkr(peakfile, testNorm, bmkfile)
>>>> mzs = as.numeric(rownames(testNorm))
>>>> bks = getMzs(testBio) ## Should be "important" biomarkers for the Mass
>>> Spec data
>>>> bks
>>>   [1]  308.497  350.487  378.092  396.084  676.031 3994.780 4597.540
>>> 7046.840 7965.760 8128.160 8351.810 9184.330
>>>
>>> I created the expression set in the following way
>>>> treat = ifelse(colnames(testNorm) < 300,"Control","Cancer")
>>>> treatdf = as.data.frame(treat)
>>>> rownames(treatdf)=colnames(testNorm)
>>>> pdt = new("AnnotatedDataFrame",treatdf)
>>>> mzdf = as.data.frame(rownames(testNorm))
>>>> rownames(mzdf)=rownames(testNorm)
>>>> mzfeat = new("AnnotatedDataFrame",mzdf)
>>>> testES =
>>> new("ExpressionSet",exprs=testNorm,phenoData=pdt,featureData=mzfeat)
>>>> varLabels(testES)
>>> [1] "treat"
>>>> table(pData(testES))
>>>   Cancer Control
>>>      162      91
>>>> featureData(testES)
>>> An object of class "AnnotatedDataFrame"
>>>    featureNames: 300.033 300.356 ... 19995.5 (13297 total)
>>>    varLabels: V1
>>>    varMetadata: labelDescription
>>>
>>> Figuring out how to obtain the eSet took at least an hour. By the way,
>> the
>>> purpose of the eSet is to obtain an object
>>> that is an input into an MLearn function for classification purposes,
>> such
>>> as:
>>> dldFS = MLearn(treat ~.,testES2,dldaI,)), where testES2 is the eset
>>> containing only the information for the
>>> important biomarkers. Clearly, I can't run MLearn (especially with CV)
>>> with all 13K features in testES. Therefore,
>>> I would like to run MLearn using the biomarkers to determine whether
>> these
>>> biomarkers actually discriminate between
>>>   the cancer and control samples. And, yes, this is the Petricoin ovarian
>>> cancer data set, for those of you who know
>>> your Mass Spec data.
>>>
>>> Now I have an eSet with the rows labeled by the mass to charge ratios and
>>> the columns labeled by the samples
>>> I would like to obtain a subset of testES using the 10 biomarkers (bks)
>>> found above. Ideally, the following
>>> would work:
>>>> testES2 =  testES[featureData(testES) == bks,]
>>>
>>>
>> Hi, Monnie.
>>
>> Try using featureNames() instead of featureData().  The featureData()
>> method returns an AnnotatedDataFrame.  You just want a vector of names, it
>> appears, so featureNames() is the method you should use.
>>
>> Sean
>>
>>
>>
>>> But I get the following error:
>>> Error in testES[featureData(testES) == bks, ] :
>>>    error in evaluating the argument 'i' in selecting a method for function
>>> '[': Error in featureData(testES) == bks :
>>>    comparison (1) is possible only for atomic and list types
>>>
>>> I tried making bks a character vector, but to no avail.  I also tried the
>>> following:
>>>> testES2 =  testES[featureData(testES) %in% bks,]  ##(where bks is a
>>> character vector or not)
>>> Error in testES[featureData(testES) %in% bks, ] :
>>>    error in evaluating the argument 'i' in selecting a method for function
>>> '[': Error in match(x, table, nomatch = 0L) :
>>>    'match' requires vector arguments
>>>
>>> Part of the problem is (probably) that I am not using the correct syntax
>>> for subsetting an eSet on the basis of featureData. Another part is that
>> the
>>> biomarkers do not have exact matches in featureData(testES) because they
>>> were obtained using a peak finding
>>> algorithm that is supposed to align peaks across all 253 samples. So, how
>>> do I obtain the m/z ratios for the important features (the biomarkers)
>> from
>>> this eSet?
>>> Is there another (saner) way to use the biomarkers in a classification
>>> algorithm in order to determine the misclassification rate with this
>>> particular
>>> set of biomarkers?
>>>
>>> And, finally, the session Info:
>>>> sessionInfo()
>>> R version 2.15.1 (2012-06-22)
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>>   [1] tools     grid      splines   stats     graphics  grDevices utils
>>> datasets  methods   base
>>>
>>> other attached packages:
>>>   [1] PROcess_1.32.0        Icens_1.28.0          survival_2.36-14
>>>   flowStats_1.14.0      flowWorkspace_1.2.0
>>>   [6] hexbin_1.26.0         IDPmisc_1.1.16        flowViz_1.20.0
>>>   XML_3.95-0            RBGL_1.32.1
>>> [11] graph_1.34.0          Cairo_1.5-2           cluster_1.14.2
>>>   mvoutlier_1.9.8       sgeostat_1.0-24
>>> [16] robCompositions_1.6.0 car_2.0-15            nnet_7.3-4
>>>   compositions_1.20-1   energy_1.4-0
>>> [21] MASS_7.3-21           boot_1.3-5            tensorA_0.36
>>>   rgl_0.92.892          fda_2.3.2
>>> [26] RCurl_1.95-0.1.2      bitops_1.0-4.1        Matrix_1.0-9
>>>   lattice_0.20-10       zoo_1.7-9
>>> [31] flowCore_1.22.3       rrcov_1.3-02          pcaPP_1.9-48
>>>   mvtnorm_0.9-9992      robustbase_0.9-4
>>> [36] Biobase_2.16.0        BiocGenerics_0.2.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] feature_1.2.8       KernSmooth_2.23-8   ks_1.8.10
>>> latticeExtra_0.6-24 RColorBrewer_1.0-5
>>> [6] stats4_2.15.1
>>>
>>>
>>> Thank you!
>>> Monnie
>>>
>>> Monnie McGee, PhD
>>> Associate Professor
>>> Statistical Science
>>> Southern Methodist University
>>> Office: 214-768-2462
>>> Fax: 214-768-4035
>>> Website: http://faculty.smu.edu/mmcgee
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793