[BioC] different results from limma using the same data set and two different methods

Gordon Smyth smyth at wehi.edu.au
Thu Jul 1 03:56:14 CEST 2004


At 11:40 AM 1/07/2004, xiaocui zhu wrote:
>Hello all, I have a cDNA data set with 16 different conditions, each
>with 3 pairs of dyeSwap measurements (so 6 repeats in total). I ranked
>differentially expressed genes in each condition using two different
>methods in Limma, and the two methods yielded different B values and
>gene ranks. I do not know if this is normal or I did something wrong.

It is normal and fully intentional. The two methods lead to the same 
estimated fold changes (fit$coef) and the same unscaled standard deviations 
(fit$stdev.unscaled) but to different residual standard deviations 
(fit$sigma) and different degrees of freedom (fit$df.residual). This is 
because, if you fit a model to all your arrays at once, the residuals from 
all the arrays are pooled to estimate the gene-wise standard deviations. 
Generally speaking, this is desirable because the extra arrays do provide 
extra information on the gene specific standard deviations and hence using 
all the arrays at once provides more reliable estimators of variability.

Gordon

>Here is how I did it:
>
>METHOD 1: Generate a MAList for each condition, and fit the linear model
>with individual MAList separately. The code looks like the following:
>
>M.condition1<-read.delim("condition1-log2Ratio.txt", header=TRUE)
>A.condition1<-read.delim("condition1-amplitude.txt", header=TRUE)
>MA.condition1<-new("MAList", list(M=M.condition1, A=A.condition1))
>MA.condition1$genes<- read.delim("Genes.txt", header=TRUE)
>
>
>fit.condition1<-lmFit(MA.condition1, design=c(1,-1,1,-1,1,-1))
>efit.condition1<-eBayes(fit.condition1)
>output<-topTable(efit.condition1, number=16200, adjust="fdr")
>write.table(output, file="condition1-ebayes-sep.txt", sep="\t")
>
>
>METHOD 2: Generate a MAList for the entire data set (96 arrays in
>total), and fit the linear model with the MAList. The code looks like
>the following:
>
>
>M.all<-read.delim("all-log2Ratio.txt", header=TRUE)
>A.all<-read.delim("all-Amplitude.txt", header=TRUE)
>MA.all<-new("MAList", list(M=M.all, A=A.all))
>MA.all$genes<-read.delim("Genes.txt", header=TRUE)
>
>treatments <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,
>                        5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7,
>                        8,8,8,8,8,8,9,9,9,9,9,9,10,10,10,10,10,10,
>                  11,11,11,11,11,11,12,12,12,12,12,12,13,13,13,13,13,13,
>                 14,14,14,14,14,14,15,15,15,15,15,15,16,16,16,16,16,16))
>design <- model.matrix(~ 0+treatments)
>design<-edit(design) # change the table to reflect the dyeSwap design

Multiplying the rows of the design matrix by a vector of 1s and -1s would 
be simpler and more reproducible than manual editing.

>colnames(design) <- c("condtion1", " condtion2"," condtion3",
>                       "condtion4", " condtion5"," condtion6",
>                       "condtion7", " condtion8"," condtion9",
>                       "condtion10", " condtion11"," condtion12",
>
>                       "condtion13", " condtion14"," condtion15",
>                       "condtion16")
>
>fit <- lmFit(MA.lab, design)
>efit<-eBayes(fit)
># To get the gene rank of condition1:
>output<-topTable(efit, coef=1, number=16200,adjust="fdr")
>write.table(output, file="condition1-ebayes-batch.txt", sep="\t")
>
>
>The rank and the B value of the same gene in the two output files are
>different.
>
>I may have done something incorrectly in METHOD 2, because some genes in
>its output file have a different "A" value when compared to the average
>of the six replicates. On the other hand, all genes in the output file
>of METHOD 1 have the same "M" and "A" value when compared to the average
>M and average A values of the six replicates.
>
>Any help would be greatly appreciated. Eventually I want to compare
>across different conditions using METHOD 2. Since I am not sure if my
>code in METHOD 2 is correct, I have to wait until I figure out what's
>going on.
>
>
>Thanks a lot!
>
>Xiaocui
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list