[BioC] Very low P-values in limma

Thu Oct 29 07:50:30 CET 2009

Dear Paul,

Is it possible that you haven't quoted your professor verbatim?, because 
these comments don't make sense as they stand.  I really know what he 
might mean by real p-values or the assumption of zero measurement error. 
Measurement error obviously can't be zero.  Nor can there be an infinite 
number of replicates.  None of the alternative analysis methods we have 
discussed makes either of these assumptions.  I wasn't the one arguing for 
averaging within-array replicates, but if that method did assume what you 
say, then it would have to be an invalid method.

On the other hand, your professor is quite right to say that within-array 
replicates measure technical rather than biological variability.  In a 
univariate analysis, one would simply average the technical replicates. 
This would give a summary reponse variable, with a variance made up of 
both biological and technical components, with replicates that you could 
reasonably treat as independent.

In a genewise microarray analysis, averaging the within-replicates has a 
disadvantage in that it fails to penalize (lower the rank of) genes which 
have high within-array variability.  If biological variability is high 
compared to technical, and you have a enough array replicates to get a 
decent estimate of between-array variability, then averaging the 
within-array replicates is likely still the way to go, just as in a 
univariate analysis.  On the other hand, if technical variability (within 
and between arrays) is relatively large compared to biological, and the 
number of array replicates is very small, then the information in the 
within-array variances can be too valuable to ignore. 
duplicateCorrelation uses the fact that the between-array variance has a 
technical as well as a biological component, and the between and within 
technical components tend to be associated across probes for many 
microarray platforms.  It is this last assumption which allows us to make 
use of within-array standard deviations when making inferences about 
between sample comparisons.

If your priority is to get reliable p-values, and you think you have 
enough array replication to do this, then average the within-array 
replicates.  If your array replication is limited, technical variability 
is high, and your priority is to rank the genes, then duplicateCorrelation 
may help.  I would add that microarray p-values should always be taken 
with a grain of salt, as it's impossible to verify all assumptions in 
small experiments, and it's useful instead to think in terms of 
independent verification of the results.

This is really as far as I want to debate it.  Obviously it's your 
analysis and you should use your own judgement.  As a maths graduate 
student, you would be able to read the duplicateCorrelation published 
paper if you want to check the reasoning in detail.

Best wishes
Gordon

On Wed, 28 Oct 2009, Paul Geeleher wrote:

> Dear list,
>
> The following are the words of a professor in my department:
>
> I still don't get why the 'real' p-values could be better than
> p-values you get with the assumption of zero measurement error. By
> averaging over within array replicates you are not ignoring the within
> array replicates, instead you are acting as though there were
> infinitely many of them, so that the standard error of the expression
> level within array is zero. Stats is about making inferences about
> populations from finite samples. The population you are making
> inferences about is the population of all late-stage breast cancers.
> The data are from 7 individuals. The within-array replicates give an
> indication of measurement error of the expression levels but don't
> give you a handle on the variability of the quantity of interest in
> the population.
>
> Paul
>
> On Sat, Oct 24, 2009 at 2:44 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>
>>
>> On Sat, 24 Oct 2009, Gordon K Smyth wrote:
>>
>>> Dear Paul,
>>>
>>> Give your consensus correlation value, limma is treating your within-array
>>> replicates as worth about 1/3 as much as replicates on independent arrays
>>> (because 1-0.81^2 is about 1/3).
>>
>> Sorry, my maths is wrong.  The effective weight of the within-array
>> replicates is quite a bit less than 1/3, given ndups=4 and cor=0.81.
>>
>> Best wishes
>> Gordon
>>
>
>
>
> -- 
> Paul Geeleher
> School of Mathematics, Statistics and Applied Mathematics
> National University of Ireland
> Galway
> Ireland
>