[BioC] different results from limma using the same data set and two different methods

Thu Jul 1 03:40:13 CEST 2004

Hello all, I have a cDNA data set with 16 different conditions, each
with 3 pairs of dyeSwap measurements (so 6 repeats in total). I ranked
differentially expressed genes in each condition using two different
methods in Limma, and the two methods yielded different B values and
gene ranks. I do not know if this is normal or I did something wrong.
Here is how I did it:

METHOD 1: Generate a MAList for each condition, and fit the linear model
with individual MAList separately. The code looks like the following:

M.condition1<-read.delim("condition1-log2Ratio.txt", header=TRUE)
A.condition1<-read.delim("condition1-amplitude.txt", header=TRUE)
MA.condition1<-new("MAList", list(M=M.condition1, A=A.condition1))
MA.condition1$genes<- read.delim("Genes.txt", header=TRUE)

fit.condition1<-lmFit(MA.condition1, design=c(1,-1,1,-1,1,-1))
efit.condition1<-eBayes(fit.condition1)
output<-topTable(efit.condition1, number=16200, adjust="fdr")
write.table(output, file="condition1-ebayes-sep.txt", sep="\t")

METHOD 2: Generate a MAList for the entire data set (96 arrays in
total), and fit the linear model with the MAList. The code looks like
the following:

M.all<-read.delim("all-log2Ratio.txt", header=TRUE)
A.all<-read.delim("all-Amplitude.txt", header=TRUE)
MA.all<-new("MAList", list(M=M.all, A=A.all))
MA.all$genes<-read.delim("Genes.txt", header=TRUE)

treatments <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,
                       5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7,
                       8,8,8,8,8,8,9,9,9,9,9,9,10,10,10,10,10,10,     
                 11,11,11,11,11,11,12,12,12,12,12,12,13,13,13,13,13,13,
                14,14,14,14,14,14,15,15,15,15,15,15,16,16,16,16,16,16))
design <- model.matrix(~ 0+treatments) 
design<-edit(design) # change the table to reflect the dyeSwap design
colnames(design) <- c("condtion1", " condtion2"," condtion3",
                      "condtion4", " condtion5"," condtion6",
                      "condtion7", " condtion8"," condtion9",
                      "condtion10", " condtion11"," condtion12",

                      "condtion13", " condtion14"," condtion15",
                      "condtion16")

fit <- lmFit(MA.lab, design)
efit<-eBayes(fit)
# To get the gene rank of condition1:
output<-topTable(efit, coef=1, number=16200,adjust="fdr")
write.table(output, file="condition1-ebayes-batch.txt", sep="\t")

The rank and the B value of the same gene in the two output files are
different. 

I may have done something incorrectly in METHOD 2, because some genes in
its output file have a different "A" value when compared to the average
of the six replicates. On the other hand, all genes in the output file
of METHOD 1 have the same "M" and "A" value when compared to the average
M and average A values of the six replicates. 

Any help would be greatly appreciated. Eventually I want to compare
across different conditions using METHOD 2. Since I am not sure if my
code in METHOD 2 is correct, I have to wait until I figure out what's
going on.

Thanks a lot!

Xiaocui