[BioC] Paired two-color design

Wed Aug 29 11:08:00 CEST 2012

Dear January,

You are asking me about the analysis of technical replicates described on 
pages 42-43 of the limma User's Guide (current release version).

These pages describe an analysis approach in which differential expression 
is assessed against technical variation only.  This approach is far from 
ideal, because it is obviously preferable to evaluate DE relative to 
biological variation.  I wrote this approach in order to give users 
something that would give some results for awkward designs when a 
completely rigorous statistical analysis was impossible or impractical. 
The idea was that the gene ranking might still be useful even if the 
p-values were too optimistic.

I agree that these pages did give the impression that this approach was a 
valid alternative to other analyses, rather than a last resort when 
nothing else would work, which is what it was.  A few months ago, I 
decided to remove this approach from the User's Guide entirely.  You will 
see that it is gone in the User's Guide in the developmental version of 
limma.

On Wed, 29 Aug 2012, January Weiner wrote:

> Dear Gordon, thank you for your answer.
>
> Gordon K Smyth wrote:
>> I would analyse it like this:
>
> That makes totally sense; however, I have one more question (sorry!
> and thank you for your patience).
>
> In the meanwhile, I have taken an alternative approach, as described
> in the second part of the chapter on technical replicates in the limma
> guide (http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf,
> page 42): fit for each biological replicate separately, then create a
> contrast corresponding to the average of these and subtract the
> control:
>
> design <- cbind(  ctrl= c( 1, -1, rep( 0, 6 ) ), e1= c( 0, 0, 1, -1,
> rep( 0, 4 ) ), e2= c( rep( 0, 4 ), 1, -1, 0, 0 ), e3= c( rep( 0, 6 ),
> 1, -1 ) )
> cmtx <- makeContrasts( "(e1+e2+e3)/3 - ctrl", levels= design )
> fit <- lmFit( MA, design ) ; fit <- contrast.fit( fit, cmtx ) ; fit <-
> eBayes( fit )
>
> If I understand the text of the limma guide, these are alternative
> approaches and should give at least similar results.
>
> And yes, the estimated logFC are exactly the same. However, the
> p-values are much different. Using duplicateCorrelation causes a bunch
> of genes to become statistically non-significant (not the other way
> round). Either duplicateCorrelation is less sensitive or more
> specific, and I wonder which is the case. Unfortunately, this bunch of
> genes changes the results of the functional analysis.

The approach you describe estimates the residual standard deviations 
entirely from the technical replicates, and hence will underestimate the 
biological variability, and hence will over-estimate the statistical 
significance of the results.  The duplicateCorrelation approach takes the 
biological variation into account, and hence is statistically more 
rigorous.

Rather than using the purely technical analysis, I would prefer that you 
handled the technical replicates via duplicateCorrelation and simply chose 
a more lenient FDR cutoff when making your gene list for functional 
analysis.

> Also, maybe I'm lost, but after reading and thinking I don't see why I 
> can't use intraspotCorrelation here. I'm not saying I can, in fact this 
> gives me vastly different results I don't really trust (given the later 
> results of the functional analysis), but I just don't see the problem, 
> since the correlations are calculated within the arrays.

For technical reasons, limma does not allow you to use both 
intraspotCorrelation and duplicateCorrelation in the same analysis.

> I got quite used to apply that, since often I'm confronted with the 
> following problem:
>
> Cy3 Cy5
> A0  B0
> B0  A0
> A1  B1
> B1  A1
>
> where the job is to compare A1 with A0 and B1 with B0. (the dye swaps 
> are technical replicates). I think that this is an unconnected design, 
> and there is no way of doing that with a normal model, so that the 
> channels should be analysed separately.

Yes, that's right.  My current recommendation for this analysis is to use 
the separate channel approach, and to simply pretend that the technical 
replicates are true biological replicates.  This is because there is no 
practical way to fully account for the biological and technical levels of 
variability in the experiment.  The technical and biological replicates 
will contribute equally to the residual standard errors.  Not ideal, but 
probably the best of the choices available.  Keep in mind that the 
p-values will be slightly too low.  This makes it more important than 
usual that the key results are independently validated downstream.

Best wishes
Gordon

> kind regards,
>
> j.
>
>
> -- 
> -------- Dr. January Weiner 3 --------------------------------------
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}