[BioC] Limma: doing multiple paired t-tests in one go...

Thu Nov 29 17:18:14 CET 2007

I guess maybe we are talking about two things. When you said 'paired' I 
made the (perhaps unwarranted) assumption that you wanted to calculate 
the difference between the diets after controlling for the dependencies 
introduced by the sibship.

I'm not sure why you would want to do things pair-wise, but if you 
really want paired t-tests, then you will have to analyze the data in 
pairs rather than all at once.

Best,

Jim

Groot, Philip de wrote:
> Hello Jim,
>  
> Thank you for your answer! Unfortunately, it does NOT solve my problem. As a matter of fact: I already tried this myself... Let me show the problem as follows. First, I reproduce what you emailed me:
>  
>> sib
> SibShips
>      113      113      114      114      101      101      103      103
> Levels: 101 103 113 114
>> rep
> [1] "Control_Diet_1"   "Treatment_Diet_1" "Control_Diet_1"   "Treatment_Diet_1"
> [5] "Control_Diet_2"   "Treatment_Diet_2" "Control_Diet_2"   "Treatment_Diet_2"
>> design <- model.matrix(~0+rep+sib)
> Warning message:
> In model.matrix.default(~0 + rep + sib) :
>   variable 'rep' converted to a factor
>> design
>   repControl_Diet_1 repControl_Diet_2 repTreatment_Diet_1 repTreatment_Diet_2
> 1                 1                 0                   0                   0
> 2                 0                 0                   1                   0
> 3                 1                 0                   0                   0
> 4                 0                 0                   1                   0
> 5                 0                 1                   0                   0
> 6                 0                 0                   0                   1
> 7                 0                 1                   0                   0
> 8                 0                 0                   0                   1
>   sib103 sib113 sib114
> 1      0      1      0
> 2      0      1      0
> 3      0      0      1
> 4      0      0      1
> 5      0      0      0
> 6      0      0      0
> 7      1      0      0
> 8      1      0      0
> attr(,"assign")
> [1] 1 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$rep
> [1] "contr.treatment"
> attr(,"contrasts")$sib
> [1] "contr.treatment"
>  
>  
> Secondly, I continue with the required Limma calculations and show you the result for a single probe:
>  
>>     fit <- lmFit(x.norm, design)
> Coefficients not estimable: sib114
> 
> (Note the error message. I suppose that it can be skipped because I am not interested in this factor anyway).
>  
>>     contrast.matrix <- makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1", "repControl_Diet_2-repTreatment_Diet_2"), levels=design)
>> contrast.matrix
>                      Contrasts
> Levels                repControl_Diet_1-repTreatment_Diet_1
>   repControl_Diet_1                                       1
>   repControl_Diet_2                                       0
>   repTreatment_Diet_1                                    -1
>   repTreatment_Diet_2                                     0
>   sib103                                                  0
>   sib113                                                  0
>   sib114                                                  0
>                      Contrasts
> Levels                repControl_Diet_2-repTreatment_Diet_2
>   repControl_Diet_1                                       0
>   repControl_Diet_2                                       1
>   repTreatment_Diet_1                                     0
>   repTreatment_Diet_2                                    -1
>   sib103                                                  0
>   sib113                                                  0
>   sib114                                                  0
> (makes sense if you ask me)
>  
>> fit <- contrasts.fit(fit, contrast.matrix)
>> tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
>> pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
>> tstat.ord[1,]
> repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
>                             -0.570853                             -1.724101
>> pvalue.ord[1,]
> repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
>                             0.6256902                             0.2268312
> 
> Now let's verify this with a reference method!:
>> y_after <- exprs(x.norm)[1,c(2,4)]
>> y_after
> A96_hA_40_113_1_final.CEL A96_hA_41_114_1_final.CEL
>                  7.182141                  7.480890
>> y_before <- exprs(x.norm)[1,c(1,3)]
>> y_before
> A96_hA_09_113_1_base.CEL A96_hA_10_114_1_base.CEL
>                 7.230783                 7.363846
>> t.verify <- t.test(y_before, y_after, paired=T)
>> print(t.verify)
>         Paired t-test
> data:  y_before and y_after
> t = -0.4128, df = 1, p-value = 0.7507
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
>  -1.086823  1.018420
> sample estimates:
> mean of the differences
>             -0.03420119
>  
> This does not agree! However, when I exclude Diet 2 in the ExpressionSet and redo the Limma calculation, it DOES match!!:
>> sib <- pData(x.norm)$sibship[c(1:4)]
>> sib
> SibShips
>      113      113      114      114
> Levels: 101 103 113 114
>> rep <- pData(x.norm)$replicates[c(1:4)]
>> rep
> [1] "Control_Diet_1"   "Treatment_Diet_1" "Control_Diet_1"   "Treatment_Diet_1"
>> design <- model.matrix(~0+rep+sib)
> Warning message:
> In model.matrix.default(~0 + rep + sib) :
>   variable 'rep' converted to a factor
>> design
>   repControl_Diet_1 repTreatment_Diet_1 sib103 sib113 sib114
> 1                 1                   0      0      1      0
> 2                 0                   1      0      1      0
> 3                 1                   0      0      0      1
> 4                 0                   1      0      0      1
> attr(,"assign")
> [1] 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$rep
> [1] "contr.treatment"
> attr(,"contrasts")$sib
> [1] "contr.treatment"
>>     fit <- lmFit(x.norm[,c(1:4)], design)
> Coefficients not estimable: sib103 sib114
>>     contrast.matrix <- makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1"), levels=design)
>> contrast.matrix
>                      Contrasts
> Levels                repControl_Diet_1-repTreatment_Diet_1
>   repControl_Diet_1                                       1
>   repTreatment_Diet_1                                    -1
>   sib103                                                  0
>   sib113                                                  0
>   sib114                                                  0
>>     fit <- contrasts.fit(fit, contrast.matrix)
>>  tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
>>  pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
>> tstat.ord[1,]
> [1] -0.412843
>> pvalue.ord[1,]
> [1] 0.7507451
> 
> Which is in agreement with the t.test-calculation I demonstrated previously! And this is exactly my problem: As soon as more diets are included, I cannot (correctly) fit a model anymore that gives me t-values which are in agreement with my reference method. Why? And more importantly: how to create a linear model (and/or contrasts matrix) that fixes this problem? Any help is highly appreciated!
>  
> Kind regards,
>  
> Philip
>  
>  
> 
> ________________________________
> 
> From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
> Sent: Thu 29-11-2007 15:07
> To: Groot, Philip de
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go...
> 
> 
> 
> Hi Philip,
> 
> Does this help?
> 
>  > sib
> [1] 113 113 114 114 101 101 103 103
> Levels: 101 103 113 114
>  > rep
> [1] Control_Diet_1   Treatment_Diet_1
> [3] Control_Diet_1   Treatment_Diet_1
> [5] Control_Diet_2   Treatment_Diet_2
> [7] Control_Diet_2   Treatment_Diet_2
> 4 Levels: Control_Diet_1 ... Treatment_Diet_2
>  > design <- model.matrix(~0+rep+sib)
>  > design
>    repControl_Diet_1 repControl_Diet_2
> 1                 1                 0
> 2                 0                 0
> 3                 1                 0
> 4                 0                 0
> 5                 0                 1
> 6                 0                 0
> 7                 0                 1
> 8                 0                 0
>    repTreatment_Diet_1 repTreatment_Diet_2 sib103
> 1                   0                   0      0
> 2                   1                   0      0
> 3                   0                   0      0
> 4                   1                   0      0
> 5                   0                   0      0
> 6                   0                   1      0
> 7                   0                   0      1
> 8                   0                   1      1
>    sib113 sib114
> 1      1      0
> 2      1      0
> 3      0      1
> 4      0      1
> 5      0      0
> 6      0      0
> 7      0      0
> 8      0      0
> attr(,"assign")
> [1] 1 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$rep
> [1] "contr.treatment"
> 
> attr(,"contrasts")$sib
> [1] "contr.treatment"
> 
> Best,
> 
> Jim
> 
> 
> 
> 
> Groot, Philip de wrote:
>> Hello All,
>>
>> I encountered a problem that I cannot easily solve, most probably because my knowledge of linear models is too restricted. The problem is that I want to do a paired t-test using limma, but that I want to fit multiple comparisons (using different patients!) simultanuously. The reason for this is that all .CEL-files in my experiment are fitted and this fit is used for the eBayes() command to maximize the advantage of using the eBayes approach.
>>
>> I found in the bioconductor mailing list a somewhat related topic:
>>
>> https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html <https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html>
>>
>> However, my problem is different. Instead of having multiple treatments over the same patients, I have multiple treatments over multiple patients (but still can do a paired t-test because before and after a single treatment is done on the same person).
>>
>> For simplicity, let's assume that I have 2 diets and 2 patients for each diet. My pData(x.norm) looks like this:
>>
>>                           sample       replicates sibship
>> A96_hA_09_113_1_base.CEL       1   Control_Diet_1     113
>> A96_hA_40_113_1_final.CEL      2 Treatment_Diet_1     113
>> A96_hA_10_114_1_base.CEL       3   Control_Diet_1     114
>> A96_hA_41_114_1_final.CEL      4 Treatment_Diet_1     114
>> A96_hA_01_101_2_base.CEL       5   Control_Diet_2     101
>> A96_hA_32_101_2_final.CEL      6 Treatment_Diet_2     101
>> A96_hA_02_103_2_base.CEL       7   Control_Diet_2     103
>> A96_hA_33_103_2_final.CEL      8 Treatment_Diet_2     103
>>
>>  My design matrix (for a paired t-test) is calculated as follows (from the Limma user guide):
>>     Replicates <- factor(pData(x.norm)$replicates)
>>     SibShip <- factor(pData(x.norm)$sibship)
>>     design <- model.matrix(~SibShip+Replicates)
>>
>> And the design matrix looks like this:
>>   (Intercept) SibShip103 SibShip113 SibShip114 ReplicatesControl_Diet_2
>> 1           1          0          1          0                        0
>> 2           1          0          1          0                        0
>> 3           1          0          0          1                        0
>> 4           1          0          0          1                        0
>> 5           1          0          0          0                        1
>> 6           1          0          0          0                        0
>> 7           1          1          0          0                        1
>> 8           1          1          0          0                        0
>>   ReplicatesTreatment_Diet_1 ReplicatesTreatment_Diet_2
>> 1                          0                          0
>> 2                          1                          0
>> 3                          0                          0
>> 4                          1                          0
>> 5                          0                          0
>> 6                          0                          1
>> 7                          0                          0
>> 8                          0                          1
>> attr(,"assign")
>> [1] 0 1 1 1 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$SibShip
>> [1] "contr.treatment"
>> attr(,"contrasts")$Replicates
>> [1] "contr.treatment"
>>
>>
>> As you can image, the comparisons I am interested in are Control_Diet_1-Treatment_Diet_1 and Control_Diet_2-Treatment_Diet_2. I might also be interested in Control_Diet_1-Control_Diet_2 and Treatment_Diet_1-Treatment_Diet_2, and so forth. My problem is that the current design matrix is rather complicated and that multiple interaction effects are somehow included, i.e. I cannot get individual effects by simply subtracting two factors in the design matrix (as I understand it). My question is: how can I create a contrast matrix that gives me the comparisons I am interested in? I am really looking forward to an answer!
>>
>> Kind Regards,
>>
>> Dr. Philip de Groot
>> Wageningen University
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623