[BioC] How to pool subgroups for makeContrasts() and subsequent limma analysis?

Wed Feb 6 18:14:56 CET 2013

Hi Rene,

On 2/6/2013 11:29 AM, René wrote:
>>> Hi René,
>>>
>>>
>>> You are almost there. Note that you want the mean of the three groups,
>>> not the sum. So
>>>
>>> makeContrasts((B1 + B2 + B3)/3 - A)
>>>
>>> will e.g., do the comparison of B vs A.
>>>
>>> Best,
>>>
>>> Jim
> Dear James,
>
> I performed the pooled analysis as you suggested and compared the results to a
> pure B - A comparison (no subgroups specified). Interestingly, both analyses
> give different results (497 vs 15 genes with log2FC>= 1 and p<  0.05).
> Could you explain this huge difference?

If I assume that by a pure B-A comparison you redefined your design 
matrix so you only have three columns (A,B,C), and then did the B-A 
comparison, then it is simple to explain. I would also guess that the 
C-A comparison gives different results as well, depending on how you 
define your design matrix.

Note that the contrast calculates the difference between the means of 
the two groups in the numerator and a measure of intra-group variability 
in the denominator. So in heuristic terms, the numerator says how 
different the groups are, and the denominator tells you if that 
difference is 'large' or not, by comparing to the within group 
variability. So if the groups are really 'tight' then a small difference 
in means might result in a significant test, but if the groups are 
really variable then the mean differences have to be pretty big as well 
to achieve significance.

How you define your groups has no bearing on the numerator, because the 
difference of B-A is the same if you do B-A or if you do (B1+B2+B3)/3-A. 
However, the denominator may well be quite different, depending on the 
B1, B2, and B3 groups.

In the instance where you did (B1+B2+B3)/3-A, the intra-group 
variability for the denominator is based in the variability within the 
A, B1, B2, B3, and C groups. So if all the B-type groups are pretty 
tight, then you will likely get more differentially expressed genes.

If you do the 'pure' B-A comparison, then the denominator is based on 
the intra-group variability of the A,B,C groups. If the B1, B2, B3 
groups are pretty tight, but not really similar, then the combined B 
group will be highly variable, so your denominator will tend to be 
larger, resulting in fewer differentially expressed genes. Since the 
denominator is the same for all contrasts, I would imagine the C-A 
comparison has fewer genes as well.

Does that help?

Best,

Jim

>
> Best regards,
> René
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099