[BioC] limma, voom, weights: factorial versus nested interaction results

Gordon K Smyth smyth at wehi.EDU.AU
Sun Jan 6 06:08:05 CET 2013


Dear Reiner,

Well, I don't honestly know yet which of edgeR or voom are better in what 
circumstances.  I feel that I can't know this until I have pushed both 
methods as far as they can go, something that I am still doing, and then 
have tried them both out on a wide variety of datasets.

I started the edgeR project way back in 2004, before RNA-Seq technology 
even existed.  My thought then was that non-normal methods would be 
essential.  And I think that it remains true that edgeR has a clear edge 
over other methods for the SAGE data that was available then.

edgeR performs very well, but has not solved all our data analysis 
problems for RNA-Seq data, not yet anyway.

voom has surprised me by performing much better than I expected.  It 
solves the biggest problem with normal-based methods by estimating the 
mean-variance relationship adaptively.  It holds its size correctly in 
almost any circumstance, scales to very large datasets, can adaptively 
estimate the amount of smoothing necessary, and gives immediate access to 
all the downstream limma pipeline including gene set testing.

My feeling the moment is that edgeR is superior for small counts and but 
that voom is safer and more reliable for noisy heterogeneous data.  Only 
edgeR can estimate the biological coefficient of variation (as we defined 
this in our 2012 paper).  But we are actively working on both methods, and 
are open to what we find.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Fri, 4 Jan 2013, Reiner Schulz wrote:

> Thank you very much, Gordon.
>
> Out of curiosity, why did you develop the voom method when there is
> edgeR? Are there good, perhaps data-dependent, reasons for using one
> over the other?
>
> Best, Reiner
>
> On 03/01/13 23:59, Gordon K Smyth wrote:
>> Dear Reiner,
>>
>> You don't mention contrasts.fit(), but I assume that you are using that
>> function to get equivalent quantities from the two design matrices.
>>
>> If I am guessing your problem correctly, then the issue is that
>> contrasts.fit() gives results that are not exact in the presence of voom
>> weights.  See the warning note on the help page for contrasts.fit.
>>
>> You will get the exact correct t-statistic if the contrast you want is a
>> coefficient of your design matrix.  You should also get exact correct
>> results when using the one way layout design formula ~0+cond, but you
>> may get an approximate test statistic if you use a contrast from the
>> nested formula.  The fold changes are exact in all cases.
>>
>> There was a thread about this a few years ago, see:
>>
>> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030947.html
>>
>>
>> Basically I have promised to change contrasts.fit() to give exact
>> results in all cases, even though this will make it much slower in some
>> cases.  I have not so far found the time to do it, but will definitely
>> do so by the time we publish the voom method.
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>>
>>
>>> Date: Wed, 2 Jan 2013 13:31:58 +0100
>>> From: Reiner Schulz <reiner.schulz at kcl.ac.uk>
>>> To: <bioconductor at r-project.org>
>>> Subject: [BioC] limma, voom,    weights: factorial versus nested
>>>     interaction results
>>>
>>> Dear all,
>>>
>>> Am analysing an RNA-seq experiment with a two-way factorial design, and
>>> am observing different results for two contrasts of interest depending
>>> on whether I use a factorial model matrix or a nested interaction model
>>> matrix. I only observe this if the data are pre-processed w/ voom:
>>>> counts.voom= voom( ...
>>> i.e., when there are observational level weights passed to lmFit. If I
>>> only pass counts.voom$E (no observational level weights) to lmFit, the
>>> results for the two contrasts of interest are identical for the
>>> factorial and nested interaction model matrices.
>>> Is this to be expected, and if so, what is the explanation?
>>>
>>> Much appreciated, Reiner
>>>
>>> PS: some more details ...
>>>
>>>> strain
>>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT  WT  WT  WT
>>> WT  WT
>>> [20] WT  WT  WT  WT  WT  WT  WT
>>> Levels: WT MUT
>>>> treat
>>> [1] U  T2 T1 U  T2 T1 U  U  T2 T1 U  T2 T1 U  T2 T1 U  T2 T1 U  U  T2
>>> T1 U  T2
>>> [26] T1
>>> Levels: U T1 T2
>>>
>>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U',
>>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2'))
>>>> design.matrix= model.matrix( ~0 + cond)
>>>> design.matrix
>>>   WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2
>>> 1  0    1     0     0      0      0
>>> 2  0    0     0     0      0      1
>>> 3  0    0     0     1      0      0
>>> 4  0    1     0     0      0      0
>>> 5  0    0     0     0      0      1
>>> 6  0    0     0     1      0      0
>>> 7  0    1     0     0      0      0
>>> 8  0    1     0     0      0      0
>>> 9  0    0     0     0      0      1
>>> 10 0    0     0     1      0      0
>>> 11 0    1     0     0      0      0
>>> 12 0    0     0     0      0      1
>>> 13 0    0     0     1      0      0
>>> 14 1    0     0     0      0      0
>>> 15 0    0     0     0      1      0
>>> 16 0    0     1     0      0      0
>>> 17 1    0     0     0      0      0
>>> 18 0    0     0     0      1      0
>>> 19 0    0     1     0      0      0
>>> 20 1    0     0     0      0      0
>>> 21 1    0     0     0      0      0
>>> 22 0    0     0     0      1      0
>>> 23 0    0     1     0      0      0
>>> 24 1    0     0     0      0      0
>>> 25 0    0     0     0      1      0
>>> 26 0    0     1     0      0      0
>>>
>>>> design.matrix.alt= model.matrix( ~strain + strain:treat)
>>>> design.matrix.alt
>>>   Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2
>>> 1  1         1   0     0      0     0
>>> 2  1         1   0     0      0     1
>>> 3  1         1   0     1      0     0
>>> 4  1         1   0     0      0     0
>>> 5  1         1   0     0      0     1
>>> 6  1         1   0     1      0     0
>>> 7  1         1   0     0      0     0
>>> 8  1         1   0     0      0     0
>>> 9  1         1   0     0      0     1
>>> 10 1         1   0     1      0     0
>>> 11 1         1   0     0      0     0
>>> 12 1         1   0     0      0     1
>>> 13 1         1   0     1      0     0
>>> 14 1         0   0     0      0     0
>>> 15 1         0   0     0      1     0
>>> 16 1         0   1     0      0     0
>>> 17 1         0   0     0      0     0
>>> 18 1         0   0     0      1     0
>>> 19 1         0   1     0      0     0
>>> 20 1         0   0     0      0     0
>>> 21 1         0   0     0      0     0
>>> 22 1         0   0     0      1     0
>>> 23 1         0   1     0      0     0
>>> 24 1         0   0     0      0     0
>>> 25 1         0   0     0      1     0
>>> 26 1         0   1     0      0     0
>>>
>>> Observed equalities (as expected):
>>> MUT == MUT.U - WT.U
>>> WT:T1 == WT.T1 - WT.U
>>> MUT:T1 == MUT.T1 - MUT.U
>>> WT:T2 == WT.T2 - WT.U
>>> MUT:T2 == MUT.T2 - MUT.U
>>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U)
>>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U)
>>>
>>> Observed inequalities (only w/ voom pre-processed data; unexpected):
>>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U)
>>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U)
>>>
>>> --
>>>
>>
>
> -- 
> Reiner Schulz, Ph.D.
> Senior Lecturer in Bioinformatics and Epigenomics
> Dept. of Medical and Molecular Genetics
> King's College London
> Guy's Hospital, 8th floor Tower Wing
> London SE1 9RT
> https://atlas.genetics.kcl.ac.uk/~rschulz
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list