[BioC] Paired arrays and limma

Wed Mar 12 16:52:58 CET 2008

Hi James

Thanks and thanks for your effort. I understand that mostly but I am
still not sure what I can do setting up a design matrix and a contrasts
matrix. I do not like the idea of losing flexibility so I would like to
be able to do it all through limma.

I want to be able to see the effects between the two diets and on the
disease condition. Diet1 is a control diet and I would like to see the
effects of Diet2. Also the difference between the Control and Disease
conditions by diet.

So if I do something like:

Pairing<-factor(descr[,"Pairing"])
Condition<-factor(descr[,"Condition"])
Diet<-factor(descr[,"Diet"])

design<-model.matrix(~ -1 + Diet + Condition + Pairing)

Is that valid?

If I then run this line I get an error message. Can I ignore the error?
Or has the lmFit failed?

fit<-lmFit(eset, design)

#(I get this error message: Coefficients not estimable: Pairing6
Pairing9 )

Then I am not sure what contrasts I can make to get what I want. Can you
suggest to me what would be some sensible choices? 

For instance, is this a valid contrast to show the difference between
the two diets?
contrast.matrix<-makeContrasts(DietDiet2 - DietDiet1, levels=design)

Help appreciated.

Regards

John Seers

> colnames(design)
 [1] "DietDiet1"        "DietDiet2"        "ConditionDisease"
"Pairing10"        "Pairing11"        "Pairing12"        "Pairing13"

 [8] "Pairing14"        "Pairing15"        "Pairing16"
"Pairing17"        "Pairing18"        "Pairing19"        "Pairing2"

[15] "Pairing20"        "Pairing21"        "Pairing22"
"Pairing23"        "Pairing24"        "Pairing25"        "Pairing26"

[22] "Pairing27"        "Pairing28"        "Pairing3"         "Pairing4"
"Pairing5"         "Pairing6"         "Pairing7"        
[29] "Pairing8"         "Pairing9"        
> 

> Pairing
 [1] 1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9  10 10 11 11
12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23
[46] 23 24 24 25 25 26 26 27 27 28 28
Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 3 4
5 6 7 8 9
> 

> Diet
 [1] Diet1 Diet1 Diet1 Diet1 Diet1 Diet1 Diet2 Diet2 Diet2 Diet2 Diet2
Diet2 Diet1 Diet1 Diet1 Diet1 Diet1 Diet1 Diet1 Diet1 Diet1 Diet1
[23] Diet1 Diet1 Diet1 Diet1 Diet2 Diet2 Diet2 Diet2 Diet2 Diet2 Diet2
Diet2 Diet2 Diet2 Diet2 Diet2 Diet2 Diet2 Diet1 Diet1 Diet2 Diet2
[45] Diet1 Diet1 Diet1 Diet1 Diet1 Diet1 Diet2 Diet2 Diet2 Diet2 Diet2
Diet2
Levels: Diet1 Diet2
> 

> 
> Condition
 [1] Control Control Control Control Control Control Control Control
Control Control Control Control Disease Disease Disease Disease Disease
[18] Disease Disease Disease Disease Disease Disease Disease Disease
Disease Disease Disease Disease Disease Disease Disease Disease Disease
[35] Disease Disease Disease Disease Control Control Disease Disease
Disease Disease Disease Disease Control Control Disease Disease Disease
[52] Disease Disease Disease Control Control
Levels: Control Disease
> 

---
Web sites:

www.ifr.ac.uk   
www.foodandhealthnetwork.com

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at med.umich.edu] 
Sent: 12 March 2008 14:59
To: john seers (IFR)
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Paired arrays and limma

Hi John,

john seers (IFR) wrote:
>  
> 
> Hi James
> 
> A fortuitous coincidence you mention my name in a posting when I was 
> reading one of your threads the other day and this prompts me to ask 
> you directly.
> 
> The thread in question is:
> 
>  
> https://stat.ethz.ch/pipermail/bioconductor/2007-November/020291.html
> 
> 
> I followed the thread through and found it did not offer a solution 
> and you finish with:
> 
> "I'm not sure why you would want to do things pair-wise, but if you 
> really want paired t-tests, then you will have to analyze the data in 
> pairs rather than all at once."
> 
> I am using limma on a similar setup and I am not sure how to pair the 
> data. The setup is before and after two diets and a condition control 
> and disease. (There is a section in the limma manual on paired data 
> and a section on factorial designs but I am not sure how to marry
them).
> 
> Can you explain what you mean by "analyzing the data in pairs rather 
> than all at once"?

Well, the poster wanted his data to agree with the results from a paired
t-test, which won't happen if you fit a model to all the data and then
compute contrasts. This is because the denominator of the statistic will
be different in the two cases.

In the former, the denominator will be the standard error of the mean,
which is computed using only the two samples under consideration.

In the latter, it will be the sums of squares of error (or some variant
thereof, depending on the model), which measures the within-group
variance of all the groups in the model, not just the two under
consideration for a given contrast.

I didn't know why he would want to do things pair-wise, as the variance
estimates get better as n goes up, so the linear model approach is often
preferable. You can see that in his example - the t-statistic was larger
when he used all his data than when he just used the paired data.

> 
> My solution so far is to preprocess the data and take the ratio of the
> expression values of the paired arrays (so halving the number of
> columns) and analyzing them in limma. That removes the pairing from
the
> limma analysis. Does that make sense to you?

Well, if by 'take the ratio' you mean 'compute the difference'(we _are_ 
on the log scale?), then this is essentially what you would be doing by 
fitting a batch effect anyway. However, it will limit the types of 
comparisons you can make since you are combining the paired data into 
differences.

Best,

Jim

> 
> Thank you in advance for any time you take.
> 
> Regards
> 
> 
> John Seers 
> 
> 
>  
> ---
>  
> Web sites:
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623