[BioC] selecting/filtering probesets from exprSet object prior to diff. exp. anal.
James W. MacDonald
jmacdon at med.umich.edu
Wed Nov 23 20:01:46 CET 2011
On 11/23/2011 1:00 PM, Mark Baumeister wrote:
> Hi all,
> I am new to this list and have a question (below) related to -
> selecting/filtering probesets from exprSet object prior to diff. exp. anal.
> I'm also new to Bioconductor and am currently learning preprocessing of
> microarray data (i.e. raw CEL files from the Affymetrix UG-133A array) and
> then working
> with the normlized exprSet object to detect differential gene expression of
> (ovarian) samples compared with normal samples. I am currently working
> with a set
> of ~33 tumor samples and ~7 normal samples.
> Because my machine is 32 bit and cannot handle that much memmory
> for the preprocessing I am using a program called RMAExpress to produce the
> normalized exprSet object. With the exprSet object (I am calling "eset") I
> am then using
> Bioconductor for the differential gene expression analysis.
> To start I have been creating a desgin matrix (as below)
> (which I name "design") for linear modeling steps I am using
> that come with the limma package.
> Normal Tumor
> T1 0 1
> T2 0 1
> T3 0 1
> T5 0 1
> T7 0 1
> N1 1 0
> T8 0 1
> T9 0 1
> T10 0 1
> T11 0 1
> N2 1 0
> T12 0 1
> T13 0 1
> T14 0 1
> T15 0 1
> N3 1 0
> and then I am using the following code to produce a linear model, a
> contrast matrix,
> and a list of differentially expressed genes.
> fit<- lmFit(eset, design)
> cont.matrix<- makeContrasts(NormalvsTumor=Tumor-Normal, levels=design)
> fit2<- contrasts.fit(fit, cont.matrix)
> fit2<- eBayes(fit2)
> topTable(fit2, number=100, adjust="BH") # use BH method
> My question is this,
> Is there a way to select or exclude ceratin probesets that I want or don't
> want to be included in the
> linear model before I produce the list (topTable) of differentially
> expressed genes?
There are ways to do this, but note that the eBayes() step above is
estimating a prior for the probeset variance that uses all probesets on
the array. If you selectively remove some probesets (say, all the
low-variance probesets), you will be biasing the prior, which may have
That said, both ExpressionSets and MArrayLM objects (the output from
eBayes()) can be subset using the conventional square-bracket functions
in R. So for example, you could remove the first ten probesets from your
fit2 object thusly:
fit2 <- fit2[-c(1:10),]
or you could create an indicator of TRUE/FALSE, based on some metric
ind <- fit2$p.value < 0.25
fit2 <- fit2[ind,]
The same thing can be done to the ExpressionSet object as well.
> I have looked at the genefilter function but have not found specific
> examples of how to do what I want.
> Thanks in advance,
James W. MacDonald, M.S.
University of Michigan
Department of Human Genetics
1241 E. Catherine St.
Ann Arbor MI 48109-5618
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor