[BioC] of limma and superfluous arrays

Tue Jan 29 22:35:41 CET 2008

Dear List,

I'm starting to do limma analyses on a small timecourse loop design  
with 2-color cDNA chips as follows:
	0h vs 6h
	6h vs 24h
	24h vs 0h
Four biological replicates -> and then four biological replicates dye  
balanced <-

My targets file begins like this (only the first two sets of three  
listed):
	US22502600_F82_S01.gpr	A_0h	A_24h
	US22502600_F65_S01.gpr	A_24h	A_6h
	US22502600_F153_S01.gpr	A_6h	A_0h
	US22502600_F85_S01.gpr	F_0h	F_6h
	US22502600_F60_S01.gpr	F_24h	F_0h
	US22502600_F72_S01.gpr	F_6h	F_24h
	... with eight such sets of three.

But then I also have some chips -> against our labs "standard"  
reference RNA:
	US22502600_F67_S01.gpr	A_24h	Ref
	US22502600_F83_S01.gpr	F_24h	Ref
	... and six more

For my limma analysis, I have three options:
	*a*: use only the minimal number of chips (ie each loop of three,  
and nothing to connect the loops). In this case, limma is unable to  
estimate one parameter in each small loop (eg the 6h timepoint). I  
can ask how many genes are differentially expressed between 24h and 0h:
		>design.noref = modelMatrix(targets.noref, ref="A_0h")
		>fit.noref = lmFit(MA.noref.p, design.noref)
		>cont.matrix= makeContrasts(T24_T0 = (A_24h+C_24h+F_24h+K_24h+N_24h 
+Q_24h+R_24h+T_24h -C_0h-F_0h-K_0h-N_0h-Q_0h-R_0h-T_0h)/8,  
levels=design.noref)
		>fit.noref2= contrasts.fit(fit.noref, cont.matrix)
		>fit.noref2=eBayes(fit.noref2)
		>summary(topTable(fit.noref2,n=10000)$adj.P.Val<=0.05)

	 ---> I get 3668 differentially expressed spots.

	*b*: provide my "24h" vs Ref chips as well
		using ref="Ref" in my design and
		> cont.matrix= makeContrasts(T24_T0 = (A_24h+C_24h+F_24h+K_24h+N_24h 
+Q_24h+R_24h+T_24h -A_0h-C_0h-F_0h-K_0h-N_0h-Q_0h-R_0h-T_0h)/8,  
levels=design)

	 ---> I get 3796 differentially expressed spots.

	*c*: use those in *b*, as well as eight additional chips done in  
parallel, that are XXX vs Ref. The XXX samples don't connect to  
anything other than Ref (they're superfluous).

	---> I get 3583 differentially expressed spots.

Searching the archives, several posts mentioned that providing more  
chips gives limma a better estimation of variance. Thus it makes  
sense to provide more. And doing so finds more differentially  
expressed genes in *b* than in *a*.
But so would it be defendable to input all the chips I did in that  
batch to limma? All the chips I've ever done?

And then I get a smaller number of differentially expressed spots in  
*c* than in *b*. Which surprises me, because using more chips should  
make my estimation of variance more precise. Comparing *b* with *c*  
leads me to conclude that the chips I've added to the analysis in *c*  
are funky because they increase estimates of variance, or that the  
chips in *b* show artificially low variance.

Does this make sense?
Obviously, in this analysis my numbers of differentially expressed  
genes are quite similar in these three cases, and 5% more or less  
significant spots probably won't make a difference. But it would be  
good to know what is most valid for future analyses as well.

Thanks and regards,

yannick

--------------------------------------------
          yannick . wurm @ unil . ch
Ant Genomics, Ecology & Evolution @ Lausanne
   http://www.unil.ch/dee/page28685_fr.html