[BioC] limma: correcting for a correlation between contrasts

Thu Jul 11 14:53:28 CEST 2013

Yes, of course you're right T2-T1 is the simpler (correct) way to 
express that contrast. Muddled thinking on my part.

Your second question gets to the heart of my problem: I know that 
wherever I set the thresholds for determining DE there will be genes 
that are DE for T2 - Ctrl, but not for T1 - Ctrl, but where if the 
magnitude of the T1 response were simply a little stronger they *would* 
be DE. Put another way, if we assume a linear dose response for the 
system to T1, which other data suggests is not an unreasonable 
assumption here, an increase in the T1 concentration would, for many, 
but not all, genes produce the same response as T2 (or vice-versa a 
decrease in T2 concentration would produce the same response as T1). I 
want to get those genes where this is *not* true.

Put one more (final!) way: Imagine T1 and T2 are actually the same 
compound but applied at 1uM and 2uM concentrations and that there is 
linear dose response to the concentration of this compound for all 
genes. Looking for DE will produce lots of genes between those two 
conditions, but fundamentally it is the same response. I want to correct 
for the difference in magnitude and so would hope for *no* reported DE 
in this (imaginary!) experiment.

Rather than go back and tell the experimentalists to tweak their T1/T2 
concentrations until the responses are equivalent, I feel there must be 
a smart way (using some assumptions for sure) to model this in. One 
simple approach that had occurred to me would be to just multiply all 
the T1 logFCs by a common factor that produces a T1 response that 
*overall* is equivalent to the T2 response. I can get that factor by 
simple linear regression of the two responses, but I'm not sure where in 
the limma workflow it makes sense (if at all) to try to shove that 
correction in.

This test code and simulated data seems to do what I want (return 0 DE 
genes in the final analysis). Is it valid to do this?

ctrl     = rnorm(100,10,1)
response = rnorm(100,0,1)

t1   = ctrl + response
t2   = ctrl + response * 2

exprs = matrix(c(ctrl,ctrl+rnorm(100,0,0.1),ctrl+rnorm(100,0,0.1),
                  t1,  t1+rnorm(100,0,0.1),  t1+rnorm(100,0,0.1),
                  t2,  t2+rnorm(100,0,0.1),  
t2+rnorm(100,0,0.1)),ncol=9)

groups = factor(c(rep("C",3),rep("T1",3),rep("T2",3)))

design = model.matrix(~0+groups)
colnames(design) = levels(groups)

contrasts = makeContrasts(T1 - C, T2 - C, levels=design)

fit = lmFit(exprs,design)
fit2 = contrasts.fit(fit,contrasts)
fit3 = eBayes(fit2)

model = lm(fit3$coefficients[,2] ~ fit3$coefficients[,1])

contrasts = makeContrasts(T1 - C, T2 - C, (T1 - 
C)*model$coefficients[2] - (T2 - C), levels=design)

fit = lmFit(exprs,design)
fit2 = contrasts.fit(fit,contrasts)
fit3 = eBayes(fit2)

topTable(fit3, coef=4)

On 11.07.2013 12:54, Gordon K Smyth wrote:
> Dear Alex,
>
> The contrast (Ctrl-T1)-(Ctrl-T2) is just equal to T2-T1, it finds
> genes for which T2 and T1 are not identical.
>
> If you want T2 specific genes, would you not simply look for genes
> that are DE for T2 - Ctrl but not for T1 - Ctrl?
>
> Best wishes
> Gordon
>
>> Date: Thu, 11 Jul 2013 10:18:16 +0100
>> From: Alex Gutteridge <alexg at ruggedtextile.com>
>> To: <bioconductor at r-project.org>
>> Subject: [BioC] limma: correcting for a correlation between 
>> contrasts
>>
>> I have an experimental design (or rather question) that goes beyond 
>> my
>> usual limma knowledge. I am hoping there is some elegant way to 
>> answer
>> it that someone here can point me towards. The experimental design 
>> is
>> straightforward: one control group (Ctrl) and two treatment groups 
>> (T1 &
>> T2) with three replicates each:
>>
>> S1  Ctrl
>> S2  Ctrl
>> S3  Ctrl
>> S4  T1
>> S5  T1
>> S6  T1
>> S7  T2
>> S8  T2
>> S9  T2
>>
>> It is clear from some basic exploratory analysis that the effect of 
>> T2
>> is very similar to T1, but has a slightly reduced response. I.e.
>> plotting log fold changes for the Ctrl - T1 contrast vs the Ctrl - 
>> T2
>> contrast gives a clear linear correlation but with slope <1. This is
>> expected given what we know about the biology of the system, but we
>> suspect there are some genes that will be specific for the T2 
>> treatment.
>> My first instinct is to setup a contrast for limma like: (Ctrl - T1) 
>> -
>> (Ctrl - T2), but because the T2 response is across the board less 
>> than
>> T1 this ends up just highlighting that fact rather than pulling out 
>> the
>> T2 'specific' response.
>>
>> Hopefully this code explains things visually with some simulated 
>> data:
>>
>> T1.response = rnorm(100,0,1)
>> T2.response = T1.response * 0.5 + rnorm(100,0,0.2)
>> T1.response[101:110] = rnorm(10,0,0.2)
>> T2.response[101:110] = rnorm(10,1,0.2)
>>
>> 
>> plot(T1.response,T2.response,xlim=c(-2,2),ylim=c(-2,2),col=c(rep("green",100),rep("black",10)),pch=16)
>>
>> abline(0,1,col="blue")
>> abline(-0.5,1,lty=2,col="blue")
>> abline(0.5,1,lty=2,col="blue")
>>
>> abline(model,col="red")
>> 
>> abline(model$coefficients[1]+0.5,model$coefficients[2],lty=2,col="red")
>> 
>> abline(model$coefficients[1]-0.5,model$coefficients[2],lty=2,col="red")
>>
>> The (Ctrl - T1) - (Ctrl - T2) contrast tends to pull out the T2
>> specific genes (black) which I want, but also those genes outside 
>> the
>> blue tramlines that are simply not responding as strongly to T2 as 
>> T1
>> (which I don't want). In the real data, those genes out-number the
>> really T2 specific genes considerably and dominate in any downstream
>> GSEA type analysis. Is there a way to setup the contrast instead so
>> limma corrects for the correlation between the two datasets and 
>> looks
>> for genes with large residuals from the red (fitted) line rather 
>> than
>> the blue?
>>
>> -- Alex Gutteridge
>
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:10}}