[BioC] Limma Design for Paired Samples Question

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jul 17 10:17:07 CEST 2012


Dear Zeynep,

You are right to be suspicious, because the design of your experiment is 
more complex than your analysis has allowed for.

Your experiment has two levels.  First of all, suppose that you only 
wanted to compared tissue1 (LA) to tissue2 (RA).  In that case you would 
have paired samples, and the comparison would be made within each subject, 
because each person returns both tissues.

However, you also want to compare normal to diseased.  This comparison can 
only be made between subjects, because each person can only be one of 
diseased or normal.

Hence you need a type of analysis that allows you to make comparisons both 
within and between subjects (persons).  This requires that you treat 
person as a random effect.  To do this is limma, you need the following:

   Person <- substring(targets$FileName,3,4)
   Treat <- factor(paste(targets$Condition,targets$TissueType,sep="."))
   design <- model.matrix(~0+Treat)
   colnames(design) <- levels(Treat)
   corfit <- duplicateCorrelation(eset,design,block=Person)
   corfit$consensus

This should give a positive correlation value.  It represents the 
correlation between measurements made on the same person.

   fit <- lmFit(eset,design,block=Person,correlation=corfit$consensus)

Now you can make any comparison you want, for example:

   cm <- makeContrasts(AF.RAvsLA=AF.RA-AF.LA,
           RA.AFvsSR=AF.RA-SR.RA, levels=design)

I have shown just two contrasts, but you could make any number of 
comparisons in this way.  Then

   fit2 <- contrasts.fit(fit, cm)
   fit2 <- eBayes(fit2)

Then

   topTable(fit2, coef="AF.RAvsLA")

will give you genes that are differentially expressed between the RA and 
LA for AF subjects.  Or

   topTable(fit2, coef="RA.AFvsSR")

will find genes that are differentially expressed between the normal and 
diseased subjects in RA tissue.

And so on.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
http://www.statsci.org/smyth

On Mon, 16 Jul 2012, zeynep özkeserli wrote:

> Dear Gordon and All,
>
> I have a question about creating the proper design matrix for my data.
>
> I am using Affymetrix GeneChips and the experimental design is like that:
>
> - We have two groups, which are "normal" and "diseased". For each person in
> each group, we collected tissues from two zones of the same organ. So my
> data looks like this:
>
> - normal
>
> tissue1
> tissue2
>
> - diseased
>
> tissue1
> tissue2
>
> So these are paired samples.
>
> Our questions are as follows:
>
> 1- Which genes are expressed differentially because of being normal or
> diseased. (the effect of being normal and diseased on gene expression)
> 2- Which genes are expressed differentially because of being tissue1 or
> tissue2. (the effect of being tissue type 1 or 2 on gene expression)
> 3- Which genes are expressed differentially in normal group between tissue1
> and tissue2
> 4- Which genes are expressed differentially in diseased group between
> tissue1 and tissue2
>
> For this purpose, I prepared a targets file in which I encoded the samples
> like I indicated. Which looks like:
>
> SlideNumber FileName Condition TissueType
> 1 AF01L.CEL AF LA
> 2 AF01R.CEL AF RA
> 3 AF02L.CEL AF LA
> 4 AF02R.CEL AF RA
> 5 AF03L.CEL AF LA
> 6 AF03R.CEL AF RA
> .
> .
> .
> 42 SR10L.CEL SR LA
> 43 SR10R.CEL SR RA
> 44 SR11L.CEL SR LA
>
>
> There are 45 CEL files, 22 with pairs and one single (would this be a 
> problem? Should I take out this sample?).
>
> So I followed the code in limma users guide section 8.3 Paired Samples,
> which is:
>
> SibShip <- factor(targets$TissueType)
> Treat <- factor(targets$Condition, levels=c("AF","SR"))
> design <- model.matrix(~SibShip+Treat)
> fit <- lmFit(eset, design)
> fit <- eBayes(fit)
>
> With this, I think :) I formed a model with two factors: Condition as the
> main factor, and TissueType as the blocking factor.
> After fitting, my coefficients in MArrayLM is
>
>          (Intercept)   SibShipRA      TreatSR
> 1007_s_at    7.828460  0.15501147  0.232908384
> 1053_at      3.868841 -0.01223435 -0.134041754
> 117_at       3.016848  0.08888230  0.197765562
> 121_at       4.721647 -0.04720335  0.052025910
> 1255_g_at    2.520547 -0.07208771  0.009031277
> .
> .
> .
>
> So;
>
> 1- For my first question, which questions the effect of the main factor 
> "Condition", (saved as Treat <- factor(targets$Condition, 
> levels=c("AF","SR")))
>
> topTable(fit, coef="TreatSR")
>
> gives me the answer. Is it true?
>
> 2- Same for the blocking factor,
>
> topTable(fit, coef="SibShipRA")
>
> gives me the answer for TissueType effect. Is this also true?
>
> From previous correspondences I concluded that I could not get answers 
> for my questions 3 and 4 with this model. And it might be meaningful to 
> add an interaction term to the model. So for 3 and 4, I tried this:
>
> design<-model.matrix(~Sibship + Treat + Sibship:Treat)
>
> (because I thought Treat is varying faster, but I don't know how to 
> check it.)
>
> After applying this,
>
> 1- Results for Treat drastically changed. I got nearly completely 
> different top genes with topTable
> 2- Results for SibShip also changed, not that much, but p values were 
> higher.
> 3- The fold change values for the same genes were different between two 
> models.
>
> So I am suspicious. Am I doing something wrong here, or am I missing
> something?
>
> Any help would be great!
> Thank you in advance,
>
> Zeynep
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list