[BioC] question regarding differential expression

James W. MacDonald jmacdon at med.umich.edu
Fri Sep 24 15:40:15 CEST 2010


Hi Jack,

On 9/23/2010 4:45 PM, Jack Luo wrote:
> Hi,
>
> This is a conceptual question related to microarray, instead of the usage of
> any Bioconductor package. I apologize if this bothers anyone.
>
> I am struggling to understand the concept of differential expression in
> terms of its resources (whether it is technical or biological). Suppose I
> have an experiment with two groups (healthy vs. disease) and try to find
> some differentially expressed genes, take two genes for example, both of
> them are differentially expressed (DE) between healthy and disease.
>
> Gene A has present detection call for all the samples under study (but the
> detection call p-value in the healthy group is in the order of 1e-2 ~ 1e-3,
> the detection call p-value in the disease group is much more significant
> (say, 1e-10)).
> Gene B has 50% present call in healthy while 100% present call in cancer.

First let's backtrack and talk about P/M/A calls, and what they mean. 
The statistics underlying these calls are testing whether or not the PM 
probes in aggregate appear to be different than the corresponding MM 
probes in a given probeset. Others will disagree, but I think it is 
incorrect to assume that an absent call means that the transcript being 
measured is absent. What it really means is that we cannot say that the 
PM probes are binding more transcript than the MM probes.

If you make the assumption that the MM probes do a good job of measuring 
background, then the absent call really means it is absent. However, a 
large percentage of MM probes have higher fluorescence readings than the 
corresponding PM probe (it varies by chip, but is usually > 30%. You can 
check with your data to verify). In addition, the MM probe intensity 
will increase with increasing amounts of transcript. These are two of 
the reasons that Affy has abandoned the use of MM probes (more real 
estate on the chip being a third), and why very few people use MAS5 for 
computing expression values any more.

So I would personally caution you against interpreting these p-values as 
indicating presence or absence of the transcript.

As to your question, technical and biological variability are completely 
confounded here, so you have to set up your experiments in such a way 
that the contribution from technical variability is minimized. For 
instance, if you do all controls one day and diseased the next, you 
cannot possibly tell if any differences were due to biology or to 
technical differences. However, if you randomize sample types over days 
processed, then the technical variability (which still exists, and is 
confounded with biological variability), will tend to appear as noise, 
and be captured by the residual term.

Also, in my opinion there isn't any difference between the two 
situations (assuming I understand situation B correctly). What I think 
you are asking is this; are there any substantive differences between a 
situation where a gene is apparently unexpressed in sample A but 
expressed to a certain degree in sample B and a situation where a gene 
is expressed in both samples, but at a two fold (or greater) level in B 
vs A.

In my opinion, there is no difference between those scenarios. In each 
situation, the gene is expressed at a much lower level in one sample 
versus the other. The relative levels are unimportant, as the absolute 
accuracy of our measuring device is not good.

Best,

Jim


>
> My question is what's the correct interpretation in terms of whether the
> differential expression is due to technical or biological? Are they both DE
> due to technical, or A is DE due to biological and B is due to technical, or
> they are both DE due to biological?
>
> Thanks a bunch,
>
> -Jack
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list