[BioC] selecting/filtering probesets from exprSet object prior to diff. exp. anal.

James W. MacDonald jmacdon at med.umich.edu
Wed Nov 23 20:45:09 CET 2011


Hi Mark,

On 11/23/2011 2:28 PM, Mark Baumeister wrote:
> Thanks a lot, James for your help.
> That seems pretty straightforward.
> That said, both ExpressionSets and MArrayLM objects (the output from 
> eBayes()) can be subset using the conventional square-bracket 
> functions in R. So for example, you could remove the first ten 
> probesets from your fit2 object thusly:
> fit2 <- fit2[-c(1:10),]
> or you could create an indicator of TRUE/FALSE, based on some metric
> ind <- fit2$p.value < 0.25
> fit2 <- fit2[ind,]
> The same thing can be done to the ExpressionSet object as well."
> If I know the probe ID's for the probes I want to select or exclude 
> from the MArrayLM object (i.e. fit2) before producing the topTable() list,
> can I also use probe ID's somehow to select or exclude from the 
> MArryaLM object?

Sure. Note that you can extract the probeset IDs from the ExpressionSet 
object using the featureNames() extractor, and then you could use either 
which() or %in% to create something that you could use to subset.

Say you have a character vector called 'probes' with all the probeset 
IDs in it.

ind <- featureNames(eset) %in% probes
fit2[!ind,]

Best,

Jim




> Mark
> On Wed, Nov 23, 2011 at 11:01 AM, James W. MacDonald 
> <jmacdon at med.umich.edu <mailto:jmacdon at med.umich.edu>> wrote:
>
>     Hi Mark,
>
>
>     On 11/23/2011 1:00 PM, Mark Baumeister wrote:
>
>         Hi all,
>
>         I am new to this list and have a question (below) related to -
>         selecting/filtering probesets from exprSet object prior to
>         diff. exp. anal.
>
>         I'm also new to Bioconductor and am currently learning
>         preprocessing of
>         microarray data (i.e. raw CEL files from the Affymetrix
>         UG-133A array) and
>         then working
>         with the normlized exprSet object to detect differential gene
>         expression of
>         tumor
>         (ovarian) samples compared with normal samples.  I am
>         currently working
>         with a set
>         of ~33 tumor samples and ~7 normal samples.
>
>         Because my machine is 32 bit and cannot handle that much memmory
>         allocation,
>         for the preprocessing I am using a program called RMAExpress
>         to produce the
>         normalized exprSet object.  With the exprSet object (I am
>         calling "eset") I
>         am then using
>         Bioconductor for the differential gene expression analysis.
>
>         To start I have been creating a desgin matrix (as below)
>         (which I name "design") for linear modeling steps I am using
>         that come with the limma package.
>
>          Normal Tumor
>         T1   0  1
>         T2   0  1
>         T3   0  1
>         T5   0  1
>         T7   0  1
>         N1  1  0
>         T8   0  1
>         T9   0  1
>         T10 0  1
>         T11 0  1
>         N2  1  0
>         T12 0  1
>         T13 0  1
>         T14 0  1
>         T15 0  1
>         N3  1  0
>
>
>
>         and then I am using the following code to produce a linear
>         model, a
>         contrast matrix,
>         and a list of differentially expressed genes.
>
>
>         fit<- lmFit(eset, design)
>         cont.matrix<- makeContrasts(NormalvsTumor=Tumor-Normal,
>         levels=design)
>         fit2<- contrasts.fit(fit, cont.matrix)
>         fit2<- eBayes(fit2)
>         topTable(fit2, number=100, adjust="BH") # use BH method
>
>         My question is this,
>         Is there a way to select or exclude ceratin probesets that I
>         want or don't
>         want to be included in the
>         linear model before I produce the list (topTable) of
>         differentially
>         expressed genes?
>
>
>     There are ways to do this, but note that the eBayes() step above
>     is estimating a prior for the probeset variance that uses all
>     probesets on the array. If you selectively remove some probesets
>     (say, all the low-variance probesets), you will be biasing the
>     prior, which may have unintended effects.
>
>     That said, both ExpressionSets and MArrayLM objects (the output
>     from eBayes()) can be subset using the conventional square-bracket
>     functions in R. So for example, you could remove the first ten
>     probesets from your fit2 object thusly:
>
>     fit2 <- fit2[-c(1:10),]
>
>     or you could create an indicator of TRUE/FALSE, based on some metric
>
>     ind <- fit2$p.value < 0.25
>
>     fit2 <- fit2[ind,]
>
>     The same thing can be done to the ExpressionSet object as well.
>
>     Best,
>
>     Jim
>
>
>
>
>         I have looked at the genefilter function but have not found
>         specific
>         examples of how to do what I want.
>
>
>         Thanks in advance,
>         -M
>
>
>
>
>
>     -- 
>     James W. MacDonald, M.S.
>     Biostatistician
>     Douglas Lab
>     University of Michigan
>     Department of Human Genetics
>     5912 Buhl
>     1241 E. Catherine St.
>     Ann Arbor MI 48109-5618
>     734-615-7826 <tel:734-615-7826>
>
>     **********************************************************
>     Electronic Mail is not secure, may not be read every day, and
>     should not be used for urgent or sensitive issues
>
>
>
>
> -- 
> Mark Baumeister
>
> http://sites.google.com/site/lfmmab/

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list