[BioC] limma design and contrast matrix for paired experiment

Gordon K Smyth smyth at wehi.EDU.AU
Mon Jan 14 01:31:12 CET 2013


Yes.

Gordon

On Mon, 14 Jan 2013, David Westergaard wrote:

> Dear Gordon,
>
> Thank you for your reply. Following the example from the edgeR users
> guide, I revised the model:
>
> CellLine <- factor(rep(c('C1','C1','C2','C2'),3),levels=c('C1','C2'))
> Sample <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6),levels=c(1,2,3,4,5,6))
> Treatment <- factor(rep(c('C','T'),6),levels=c('C','T'))
>
> design <- model.matrix( ~ CellLine + CellLine:Sample + CellLine:Treatment   )
>
> I am however, not, a bit unsure of which coefficients I am interested
> in. It seems logical that I should be interested in
> 'CellLineC1:TreatmentT' and 'CellLineC2:TreatmentT' to answer the
> question of 'Which genes are differentially expressed in CellLine 1
> due to treatment X' and 'Which genes are differentially expressed in
> CellLine 2 due to treatment X'. Is that correct?
>
> Best,
> David
>
> 2013/1/13 Gordon K Smyth <smyth at wehi.edu.au>:
>> Dear David,
>>
>> No, your design matrix is incorrect because it ignores the pairing by sample
>> in your experimental design.
>>
>> You can treat Sample as either a fixed or a random factor.  The fixed
>> approach is closely analogous to a classical paired t-test.  For an
>> experiment like yours, the fixed approach is explained in Section 3.5 of the
>> edgeR User's Guide:
>>
>> http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>>
>> You can follow the same construction of the design matrix even though you
>> are using limma.
>>
>> The random approach is a bit more aggressive.  For an experiment like yours,
>> the random approach is explained in Section 8.7 of the limma User's Guide:
>>
>> http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf
>>
>> I would probably recommend the first approach for your data.  However the
>> second approach is necessary if you want to test for differences between the
>> two cell lines.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Fri, 11 Jan 2013 18:31:23 +0100
>>> From: David Westergaard <david at harsk.dk>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC]  limma design and contrast matrix for paired
>>>         experiment
>>>
>>> Hello,
>>>
>>> I am analysing microarray data performed on two cell cultures, in
>>> which the gene expression were measured before (C) and after treatment
>>> (T), so that the targets look like this:
>>>
>>> Cell-line       Treatment       Sample
>>> Cell 1  C       1
>>> Cell 1  T       1
>>> Cell 2  C       2
>>> Cell 2  T       2
>>> Cell 1  C       3
>>> Cell 1  T       3
>>> Cell 2  C       4
>>> Cell 2  T       4
>>> Cell 1  C       5
>>> Cell 1  T       5
>>> Cell 2  C       6
>>> Cell 2  T       6
>>>
>>> All experiments were performed on single-channel Agilent arrays, with
>>> 4 samples pr. slide. I am interested in determining the differentially
>>> expressed genes between Cell1 before and after treatment, as well as
>>> Cell2 before and after treatment. This is the preliminary code:
>>>
>>> # Load and normalize data
>>> RG <-
>>> read.maimages(targets$FileName,source="agilent.median",green.only=TRUE)
>>> # Assume there is a col called FileName in the targets section
>>> RG <- backgroundCorrect(RG, method="normexp", offset=16)
>>> RGNorm <- normalizeBetweenArrays(RG, method="quantile")
>>> RGNorm.ave <- avereps(RGNorm, ID=RGNorm$genes$ProbeName)
>>>
>>> # Create design
>>> Pairing <-
>>> paste(rep(c('C1-','C1-','C2-','C2-'),3),c(1,1,2,2,1,1,2,2,1,1,2,2),rep(c('C','T'),6),sep='')
>>> pair <- factor(Pairing,levels=unique(Pairing))
>>> design <- model.matrix( ~ 0 + pair )
>>> colnames(design) <- c('C1.C','C1.T','C2.C','C2.T')
>>>
>>> # Fit data
>>> fit <- lmFit(RGNorm.ave, design=design)
>>>
>>> cont.matrix <- makeContrasts(C1 = C1.T-C1.C, C2=C2.T-C2.C, levels=design)
>>> fit2 <- contrasts.fit(fit, cont.matrix)
>>> fit2 <- eBayes(fit2)
>>>
>>>
>>> For the experiment, is the design/contrast matrix a proper choice to
>>> answer the questions of 'Which genes are differentially expressed in
>>> Cell 1' and  'Which genes are differentially expressed in Cell 2'?
>>> Further, should I do any technical correction, such as
>>> duplicateCorrelation or similar? The reason I am asking is that even
>>> at p<=0.01 I am getting a very high number of differentially expressed
>>> probes (4500ish for Cell 1, and 7500ish for Cell 2, respectively), and
>>> I want to make sure this is biological significance, and not some
>>> technical aspect I have missed.
>>>
>>> Thanks in advance.
>>>
>>> Best,
>>> David
>>>
>>>
>>>> sessionInfo()
>>>
>>> R version 2.14.1 (2011-12-22)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] statmod_1.4.16 limma_3.10.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tcltk_2.14.1 tools_2.14.1
>>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list