[BioC] Limma: doing multiple paired t-tests in one go...

Gordon Smyth smyth at wehi.EDU.AU
Sat Dec 1 02:35:38 CET 2007


Dear Philip,

You are seeing problems where really there are none.

The advice from the limma User's Guide guided you correctly. Your 
limma session below is perfectly correct. Jim's advice was perfectly 
correct and helpful.

The only problem is that you're insisting on treating the simple 
paired t-test as a reference method which you believe you have to 
reproduce. The aim of linear modelling in limma is to give you the 
best possible test, not to reproduce the test statistic you would 
have got if you only had half as much data.

If all you want is to do is to reproduce ordinary paired t-tests, 
then you have to analyse Diet_1 and Diet_2 separately with separate 
linear models. In that case, there's no point in any further 
discussion. If you are willing to embrace a more general variant of 
the paired t-test which might be more powerful, then read on.

The tests you get from linear modelling will differ from simple 
paired t-tests because the residual standard deviation is pooled 
across all your arrays. Your first 4 arrays allow you to test 
Treatment_Diet_1 vs Control_Diet_1 with 1 residual df. Your second 
set of 4 arrays allow you to test Treatment_Diet_2 vs Control_Diet_2, 
also with 1 residual df. When you put all 8 arrays into your linear 
model, the two residual df are pooled so that you get t-tests for 
Treatment_Diet_1 vs Control_Diet_1 and for Treatment_Diet_2 vs 
Control_Diet_2 on 2 residual df. This is usually a more powerful approach.

The subtlety to your experiment is that you want to conduct two 
disconnected tests. The usual model formula in R are not tuned to 
this situation. The parametrisations you've been using will give you 
the correct results (because limma will figure it out, and will drop 
superfluous coefficients as necessary), but they do tend to obscure 
the true situation. Here is a simpler parametrisation which makes the 
situation more explicit:

   > design <- model.matrix(~0+Sibship)
   > design <- cbind(design,TreatmentvsControl1=c(0,1,0,1,0,0,0,0))
   > design <- cbind(design,TreatmentvsControl2=c(0,0,0,0,0,1,0,1))
   > design
     Sibship101 Sibship103 Sibship113 Sibship114 TreatmentvsControl1 
TreatmentvsControl2
   1          0          0          1          0                   0 
                  0
   2          0          0          1          0                   1 
                  0
   3          0          0          0          1                   0 
                  0
   4          0          0          0          1                   1 
                  0
   5          1          0          0          0                   0 
                  0
   6          1          0          0          0                   0 
                  1
   7          0          1          0          0                   0 
                  0
   8          0          1          0          0                   0 
                  1

This makes it explicit that you there are only 6 independent 
coefficients in your model, not 7 as your previous design matrices 
have had. With this design matrix you don't even need to use 
contrasts. The last two coefficients already represent the two 
comparisons you want to make.

Best wishes
Gordon


>Date: Thu, 29 Nov 2007 16:42:41 +0100
>From: "Groot, Philip de" <philip.degroot at wur.nl>
>Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go...
>To: "James W. MacDonald" <jmacdon at med.umich.edu>
>Cc: bioconductor at stat.math.ethz.ch
>
>Hello Jim,
>
>Thank you for your answer! Unfortunately, it does NOT solve my 
>problem. As a matter of fact: I already tried this myself... Let me 
>show the problem as follows. First, I reproduce what you emailed me:
>
> > sib
>SibShips
>      113      113      114      114      101      101      103      103
>Levels: 101 103 113 114
> > rep
>[1] "Control_Diet_1"   "Treatment_Diet_1" 
>"Control_Diet_1"   "Treatment_Diet_1"
>[5] "Control_Diet_2"   "Treatment_Diet_2" 
>"Control_Diet_2"   "Treatment_Diet_2"
> > design <- model.matrix(~0+rep+sib)
>Warning message:
>In model.matrix.default(~0 + rep + sib) :
>   variable 'rep' converted to a factor
> > design
>   repControl_Diet_1 repControl_Diet_2 repTreatment_Diet_1 repTreatment_Diet_2
>1                 1                 0                   0                   0
>2                 0                 0                   1                   0
>3                 1                 0                   0                   0
>4                 0                 0                   1                   0
>5                 0                 1                   0                   0
>6                 0                 0                   0                   1
>7                 0                 1                   0                   0
>8                 0                 0                   0                   1
>   sib103 sib113 sib114
>1      0      1      0
>2      0      1      0
>3      0      0      1
>4      0      0      1
>5      0      0      0
>6      0      0      0
>7      1      0      0
>8      1      0      0
>attr(,"assign")
>[1] 1 1 1 1 2 2 2
>attr(,"contrasts")
>attr(,"contrasts")$rep
>[1] "contr.treatment"
>attr(,"contrasts")$sib
>[1] "contr.treatment"
>
>
>Secondly, I continue with the required Limma calculations and show 
>you the result for a single probe:
>
> >     fit <- lmFit(x.norm, design)
>Coefficients not estimable: sib114
>
>(Note the error message. I suppose that it can be skipped because I 
>am not interested in this factor anyway).
>
> >     contrast.matrix <- 
> makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1", 
> "repControl_Diet_2-repTreatment_Diet_2"), levels=design)
> > contrast.matrix
>                      Contrasts
>Levels                repControl_Diet_1-repTreatment_Diet_1
>   repControl_Diet_1                                       1
>   repControl_Diet_2                                       0
>   repTreatment_Diet_1                                    -1
>   repTreatment_Diet_2                                     0
>   sib103                                                  0
>   sib113                                                  0
>   sib114                                                  0
>                      Contrasts
>Levels                repControl_Diet_2-repTreatment_Diet_2
>   repControl_Diet_1                                       0
>   repControl_Diet_2                                       1
>   repTreatment_Diet_1                                     0
>   repTreatment_Diet_2                                    -1
>   sib103                                                  0
>   sib113                                                  0
>   sib114                                                  0
> >
>(makes sense if you ask me)
>
> > fit <- contrasts.fit(fit, contrast.matrix)
> > tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
> > pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
> > tstat.ord[1,]
>repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
>                             -0.570853                             -1.724101
> > pvalue.ord[1,]
>repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
>                             0.6256902                             0.2268312
>
>Now let's verify this with a reference method!:
> > y_after <- exprs(x.norm)[1,c(2,4)]
> > y_after
>A96_hA_40_113_1_final.CEL A96_hA_41_114_1_final.CEL
>                  7.182141                  7.480890
> > y_before <- exprs(x.norm)[1,c(1,3)]
> > y_before
>A96_hA_09_113_1_base.CEL A96_hA_10_114_1_base.CEL
>                 7.230783                 7.363846
> > t.verify <- t.test(y_before, y_after, paired=T)
> > print(t.verify)
>         Paired t-test
>data:  y_before and y_after
>t = -0.4128, df = 1, p-value = 0.7507
>alternative hypothesis: true difference in means is not equal to 0
>95 percent confidence interval:
>  -1.086823  1.018420
>sample estimates:
>mean of the differences
>             -0.03420119
>
>This does not agree! However, when I exclude Diet 2 in the 
>ExpressionSet and redo the Limma calculation, it DOES match!!:
> > sib <- pData(x.norm)$sibship[c(1:4)]
> > sib
>SibShips
>      113      113      114      114
>Levels: 101 103 113 114
> > rep <- pData(x.norm)$replicates[c(1:4)]
> > rep
>[1] "Control_Diet_1"   "Treatment_Diet_1" 
>"Control_Diet_1"   "Treatment_Diet_1"
> > design <- model.matrix(~0+rep+sib)
>Warning message:
>In model.matrix.default(~0 + rep + sib) :
>   variable 'rep' converted to a factor
> > design
>   repControl_Diet_1 repTreatment_Diet_1 sib103 sib113 sib114
>1                 1                   0      0      1      0
>2                 0                   1      0      1      0
>3                 1                   0      0      0      1
>4                 0                   1      0      0      1
>attr(,"assign")
>[1] 1 1 2 2 2
>attr(,"contrasts")
>attr(,"contrasts")$rep
>[1] "contr.treatment"
>attr(,"contrasts")$sib
>[1] "contr.treatment"
> >     fit <- lmFit(x.norm[,c(1:4)], design)
>Coefficients not estimable: sib103 sib114
> >     contrast.matrix <- 
> makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1"), 
> levels=design)
> > contrast.matrix
>                      Contrasts
>Levels                repControl_Diet_1-repTreatment_Diet_1
>   repControl_Diet_1                                       1
>   repTreatment_Diet_1                                    -1
>   sib103                                                  0
>   sib113                                                  0
>   sib114                                                  0
> >     fit <- contrasts.fit(fit, contrast.matrix)
> >  tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
> >  pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, 
> lower.tail=FALSE)
> > tstat.ord[1,]
>[1] -0.412843
> > pvalue.ord[1,]
>[1] 0.7507451
>
>Which is in agreement with the t.test-calculation I demonstrated 
>previously! And this is exactly my problem: As soon as more diets 
>are included, I cannot (correctly) fit a model anymore that gives me 
>t-values which are in agreement with my reference method. Why? And 
>more importantly: how to create a linear model (and/or contrasts 
>matrix) that fixes this problem? Any help is highly appreciated!
>
>Kind regards,
>
>Philip
>
>
>
>________________________________
>
>From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
>Sent: Thu 29-11-2007 15:07
>To: Groot, Philip de
>Cc: bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go...
>
>
>
>Hi Philip,
>
>Does this help?
>
>  > sib
>[1] 113 113 114 114 101 101 103 103
>Levels: 101 103 113 114
>  > rep
>[1] Control_Diet_1   Treatment_Diet_1
>[3] Control_Diet_1   Treatment_Diet_1
>[5] Control_Diet_2   Treatment_Diet_2
>[7] Control_Diet_2   Treatment_Diet_2
>4 Levels: Control_Diet_1 ... Treatment_Diet_2
>  > design <- model.matrix(~0+rep+sib)
>  > design
>    repControl_Diet_1 repControl_Diet_2
>1                 1                 0
>2                 0                 0
>3                 1                 0
>4                 0                 0
>5                 0                 1
>6                 0                 0
>7                 0                 1
>8                 0                 0
>    repTreatment_Diet_1 repTreatment_Diet_2 sib103
>1                   0                   0      0
>2                   1                   0      0
>3                   0                   0      0
>4                   1                   0      0
>5                   0                   0      0
>6                   0                   1      0
>7                   0                   0      1
>8                   0                   1      1
>    sib113 sib114
>1      1      0
>2      1      0
>3      0      1
>4      0      1
>5      0      0
>6      0      0
>7      0      0
>8      0      0
>attr(,"assign")
>[1] 1 1 1 1 2 2 2
>attr(,"contrasts")
>attr(,"contrasts")$rep
>[1] "contr.treatment"
>
>attr(,"contrasts")$sib
>[1] "contr.treatment"
>
>Best,
>
>Jim
>
>
>
>
>Groot, Philip de wrote:
> > Hello All,
> >
> > I encountered a problem that I cannot easily solve, most probably 
> because my knowledge of linear models is too restricted. The 
> problem is that I want to do a paired t-test using limma, but that 
> I want to fit multiple comparisons (using different patients!) 
> simultanuously. The reason for this is that all .CEL-files in my 
> experiment are fitted and this fit is used for the eBayes() command 
> to maximize the advantage of using the eBayes approach.
> >
> > I found in the bioconductor mailing list a somewhat related topic:
> >
> > 
> https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.htm 
> l <https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html>
> >
> > However, my problem is different. Instead of having multiple 
> treatments over the same patients, I have multiple treatments over 
> multiple patients (but still can do a paired t-test because before 
> and after a single treatment is done on the same person).
> >
> > For simplicity, let's assume that I have 2 diets and 2 patients 
> for each diet. My pData(x.norm) looks like this:
> >
> >                           sample       replicates sibship
> > A96_hA_09_113_1_base.CEL       1   Control_Diet_1     113
> > A96_hA_40_113_1_final.CEL      2 Treatment_Diet_1     113
> > A96_hA_10_114_1_base.CEL       3   Control_Diet_1     114
> > A96_hA_41_114_1_final.CEL      4 Treatment_Diet_1     114
> > A96_hA_01_101_2_base.CEL       5   Control_Diet_2     101
> > A96_hA_32_101_2_final.CEL      6 Treatment_Diet_2     101
> > A96_hA_02_103_2_base.CEL       7   Control_Diet_2     103
> > A96_hA_33_103_2_final.CEL      8 Treatment_Diet_2     103
> >
> >  My design matrix (for a paired t-test) is calculated as follows 
> (from the Limma user guide):
> >     Replicates <- factor(pData(x.norm)$replicates)
> >     SibShip <- factor(pData(x.norm)$sibship)
> >     design <- model.matrix(~SibShip+Replicates)
> >
> > And the design matrix looks like this:
> >   (Intercept) SibShip103 SibShip113 SibShip114 ReplicatesControl_Diet_2
> > 1           1          0          1          0                        0
> > 2           1          0          1          0                        0
> > 3           1          0          0          1                        0
> > 4           1          0          0          1                        0
> > 5           1          0          0          0                        1
> > 6           1          0          0          0                        0
> > 7           1          1          0          0                        1
> > 8           1          1          0          0                        0
> >   ReplicatesTreatment_Diet_1 ReplicatesTreatment_Diet_2
> > 1                          0                          0
> > 2                          1                          0
> > 3                          0                          0
> > 4                          1                          0
> > 5                          0                          0
> > 6                          0                          1
> > 7                          0                          0
> > 8                          0                          1
> > attr(,"assign")
> > [1] 0 1 1 1 2 2 2
> > attr(,"contrasts")
> > attr(,"contrasts")$SibShip
> > [1] "contr.treatment"
> > attr(,"contrasts")$Replicates
> > [1] "contr.treatment"
> >
> >
> > As you can image, the comparisons I am interested in are 
> Control_Diet_1-Treatment_Diet_1 and 
> Control_Diet_2-Treatment_Diet_2. I might also be interested in 
> Control_Diet_1-Control_Diet_2 and 
> Treatment_Diet_1-Treatment_Diet_2, and so forth. My problem is that 
> the current design matrix is rather complicated and that multiple 
> interaction effects are somehow included, i.e. I cannot get 
> individual effects by simply subtracting two factors in the design 
> matrix (as I understand it). My question is: how can I create a 
> contrast matrix that gives me the comparisons I am interested in? I 
> am really looking forward to an answer!
> >
> > Kind Regards,
> >
> > Dr. Philip de Groot
> > Wageningen University
> >
> >
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Affymetrix and cDNA Microarray Core
>University of Michigan Cancer Center
>1500 E. Medical Center Drive
>7410 CCGC
>Ann Arbor MI 48109
>734-647-5623



More information about the Bioconductor mailing list