[BioC] question regarding differential expression
James W. MacDonald
jmacdon at med.umich.edu
Fri Sep 24 15:40:15 CEST 2010
On 9/23/2010 4:45 PM, Jack Luo wrote:
> This is a conceptual question related to microarray, instead of the usage of
> any Bioconductor package. I apologize if this bothers anyone.
> I am struggling to understand the concept of differential expression in
> terms of its resources (whether it is technical or biological). Suppose I
> have an experiment with two groups (healthy vs. disease) and try to find
> some differentially expressed genes, take two genes for example, both of
> them are differentially expressed (DE) between healthy and disease.
> Gene A has present detection call for all the samples under study (but the
> detection call p-value in the healthy group is in the order of 1e-2 ~ 1e-3,
> the detection call p-value in the disease group is much more significant
> (say, 1e-10)).
> Gene B has 50% present call in healthy while 100% present call in cancer.
First let's backtrack and talk about P/M/A calls, and what they mean.
The statistics underlying these calls are testing whether or not the PM
probes in aggregate appear to be different than the corresponding MM
probes in a given probeset. Others will disagree, but I think it is
incorrect to assume that an absent call means that the transcript being
measured is absent. What it really means is that we cannot say that the
PM probes are binding more transcript than the MM probes.
If you make the assumption that the MM probes do a good job of measuring
background, then the absent call really means it is absent. However, a
large percentage of MM probes have higher fluorescence readings than the
corresponding PM probe (it varies by chip, but is usually > 30%. You can
check with your data to verify). In addition, the MM probe intensity
will increase with increasing amounts of transcript. These are two of
the reasons that Affy has abandoned the use of MM probes (more real
estate on the chip being a third), and why very few people use MAS5 for
computing expression values any more.
So I would personally caution you against interpreting these p-values as
indicating presence or absence of the transcript.
As to your question, technical and biological variability are completely
confounded here, so you have to set up your experiments in such a way
that the contribution from technical variability is minimized. For
instance, if you do all controls one day and diseased the next, you
cannot possibly tell if any differences were due to biology or to
technical differences. However, if you randomize sample types over days
processed, then the technical variability (which still exists, and is
confounded with biological variability), will tend to appear as noise,
and be captured by the residual term.
Also, in my opinion there isn't any difference between the two
situations (assuming I understand situation B correctly). What I think
you are asking is this; are there any substantive differences between a
situation where a gene is apparently unexpressed in sample A but
expressed to a certain degree in sample B and a situation where a gene
is expressed in both samples, but at a two fold (or greater) level in B
In my opinion, there is no difference between those scenarios. In each
situation, the gene is expressed at a much lower level in one sample
versus the other. The relative levels are unimportant, as the absolute
accuracy of our measuring device is not good.
> My question is what's the correct interpretation in terms of whether the
> differential expression is due to technical or biological? Are they both DE
> due to technical, or A is DE due to biological and B is due to technical, or
> they are both DE due to biological?
> Thanks a bunch,
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
James W. MacDonald, M.S.
University of Michigan
Department of Human Genetics
1241 E. Catherine St.
Ann Arbor MI 48109-5618
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor