[BioC] Limma design : What is the "best" method to infer biological replicates during differentially expression analysis

Gordon K Smyth smyth at wehi.EDU.AU
Thu Feb 20 03:09:35 CET 2014


> Date: Tue, 18 Feb 2014 12:05:41 +0100
> From: Santy Marques-Ladeira <santy.marquesladeira at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] Limma design : What is the "best" method to infer
> 	biological replicates during differentially expression analysis
>
> Dear Limma users,
>
> I'm currently working on Affymetrix microarray (Hu-gene 2.1) with only one
> channel. I got 8 experimental stimuli conducted on 7 donors. During the
> experimental process each stimulus was conducted at the same time for all
> the donors so I assume there are no technical replicates, but I need to
> take in account the "donor effect" (or biological replicates).
>
> In the "Limma" documentation there are 2 ways of including this "effect"
> but it is still not clear to me what is the best choice, and the 2 methods
> gave me 86% of correlation for a pairwise comparison test.
>
>
>
> *1/The first method I used was including the donors in my design model :*
>
> # I build a matrix with 2 columns (Stimulus + Donor)
> # Each row is an array
> sample = matrix(0, nrow=55, ncol=2)
> rownames(sample) <- colnames("normalised data I use to perform the
> analysis")
> colnames(sample) <- c("Stimulus", "Donor")
>
> # In the column Stimulus there is directly the name of the stimulus tested
> sample[1:7,1] = "Stim1"
> sample[8:14,1] = "Stim2"
> sample[15:20,1] = "Stim3"
> sample[21:27,1] = "Stim4"
> sample[28:34,1] = "Stim5"
> sample[35:41,1] = "Stim6"
> sample[42:48,1] = "Stim7"
> sample[49:55,1] = "Stim8"
> # In the column Donor there is the number of the donor (from 2 to 8)
> sample[,2] = c(rep(c(2,3,4,5,6,7,8),2),3,4,5,6,7,8,rep(c(2,3,4,5,6,7,8),5))
> targets = as.data.frame(sample)
>
> Stimulus = factor(targets$Stimulus, levels = c("Stim1", "Stim2", "Stim3",
> "Stim4", "Stim5", "Stim6", "Stim7", "Stim8"))
> Donor = factor(targets$Donor, levels = c("2", "3", "4", "5", "6", "7", "8"))
>
> # I use the model.matrix() function to infer the design based on my matrix
> design <- model.matrix(~-1+Stimulus+Donor)
> rownames(design) <- colnames("normalised data I use to perform the
> analysis")
> colnames(design)[1:8] <- c("Stim1", "Stim2", "Stim3", "Stim4", "Stim5",
> "Stim6", "Stim7", "Stim8")
>
> #This gives me :
> #                      Stim1   Stim2   Stim3   ...   ...   ...   Donor3
> Donor4   ...   ...   Donor8
> #experience1        1          0         0
> 0            0                       0
> #experience2        1          0         0
> 1            0                       0
> #experience3        1          0         0
> 0            1                       0
> #...
> #...
> #...
>
> First thing is that because I remove for a stimulus the donor2, this donor
> is removed from the design model (maybe because of dimension for regression
> ?)

The design matrix can't have more columns than

  1 + nlevels(Stimulus)-1 + nlevels(Donor)-1

You have 7 donors, but you can only add 6 new columns to the design matirx 
when you add Donor to the linear model.  By default, R removes the first 
level of the factor and makes the coefficients for the other levels 
relative to the first level.

When you add an new factor to the model, you are estimating differences 
between the levels of the factor, but you can't estimate the overall mean 
more than once.

Gordon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list