[BioC] limma design

Mon Jun 18 15:34:46 CEST 2007

"The latter seems a bit strange to me, because the number of genes 
classified as differentially expressed in one comparison (contrast) 
should not depend on the genes differentially expressed in some other 
comparison (contrast)."

Welcome to statistics!  This is a feature of all statistical 
analyses.  e.g. You do one test only and use alpha=0.05 to test, 
rejecting with p=0.049.  But if you do 10,000 tests you will use a 
multiple comparisons procedure, and your original test will no longer 
be rejecting.  Bayesian analysis also depends on the entire set of 
tests, because the posterior depends on all the data.

--Naomi

At 04:48 AM 6/18/2007, Lev Soinov wrote:
>   Dear Gordon and List,
>
>   I would very much appreciate your comment on the experiment 
> design in LIMMA. It is about processing of experiments with 
> multiple treatments.
>
>   Let's say we have a simple Affy experiment with 16 samples 
> collected from a cell line (treated/untreated) in two time points:
>   - 4 treated, 4 untreated - time point 1
>   - 4 treated, 4 untreated - time point 2
>   We are interested in differential expression between treated and 
> untreated cells, in point1 and point2 separately.
>   When we process all samples together (normalise them together and 
> fit linear fit models using the whole dataset) in LIMMA we will get 
> results different from when we process data for points 1 and 2 
> separately (normalise them together but fit liner models separately).
>
>   I do understand that it should be like this (more information for 
> priors), but I do not know whether there is some kind of a 
> criterion helping decide whether to process them separately or in 
> one go. It seems that adding more treatments into the mix increases 
> statistical power and thus, increases the number of genes 
> classified as differentially expressed.
>   The latter seems a bit strange to me, because the number of genes 
> classified as differentially expressed in one comparison (contrast) 
> should not depend on the genes differentially expressed in some 
> other comparison (contrast).
>
>   I have tried this on real data, processing data for different 
> time points separately and together. I found that t and p values 
> and also ordering of genes were different.
>
>   To illustrte this further I created a simple example in R to 
> demonstrate the effect of adding more treatments:
>   I have generated a random matrix with 12 columns (4 treatments, 3 
> replicates each), first 3 genes are upregulated in the first treatment.
>   Then I ran linear fit + eBayes (i) on the first 6 columns, (ii) 
> on the first 9 columns and (iii) on all 12 columns, while only 
> testing the contrast between the first two treatments. My code and 
> results are below.
>   topTable results, below, show that by adding treatments we change 
> estimates not only for moderated t statistics but also for ordinary ones.
>   Could you, please, help me clarify this? I'd like to understand 
> why information about
>   treatments that may have no influence on the contrast of interest 
> (in my example it may be the same cell line: untreated plus treated 
> with 3 different chemical compounds, and we test ONLY the 
> difference between untreated cells and cells treated with compound 
> 1, leaving treatments with the other two compaunds out) affects 
> lmFit and eBayes results for this contrast so much.
>   Thank you very much for your help.
>   With kind regards,
>   Lev Soinov.
>
>   Code in R:
>   set.seed(2004); invisible(runif(100))
>   M1 <- matrix(rnorm(100*12,sd=0.3),100,12)
>   M<-M1[,1:6]
>   M[1:3,1:3] <- M[1:3,1:3] + 2
>   design<-model.matrix(~0 +factor(c(1,1,1,2,2,2)))
>   colnames(design) <- c("group1", "group2")
>   contrast.matrix <- makeContrasts(group2-group1, levels=design)
>   fit <- lmFit(M, design)
>   fit2 <- contrasts.fit(fit, contrast.matrix)
>   ordinary.t1 <- fit2$coef / fit2$stdev.unscaled / fit2$sigma
>   fit2 <- eBayes(fit2)
>   topTable(fit2, coef=1, adjust="BH")
>
>   M<-M1[,1:9]
>   M[1:3,1:3] <- M[1:3,1:3] + 2
>   design<-model.matrix(~0 +factor(c(1,1,1,2,2,2,3,3,3)))
>   colnames(design) <- c("group1", "group2", "group3")
>   contrast.matrix <- makeContrasts(group2-group1, levels=design)
>   fit <- lmFit(M, design)
>   fit2 <- contrasts.fit(fit, contrast.matrix)
>   ordinary.t2 <- fit2$coef / fit2$stdev.unscaled / fit2$sigma
>   fit2 <- eBayes(fit2)
>   topTable(fit2, coef=1, adjust="BH")
>
>   M<-M1[,1:12]
>   M[1:3,1:3] <- M[1:3,1:3] + 2
>   design<-model.matrix(~0 +factor(c(1,1,1,2,2,2,3,3,3,4,4,4)))
>   colnames(design) <- c("group1", "group2", "group3", "group4")
>   contrast.matrix <- makeContrasts(group2-group1, levels=design)
>   fit <- lmFit(M, design)
>   fit2 <- contrasts.fit(fit, contrast.matrix)
>   ordinary.t3 <- fit2$coef / fit2$stdev.unscaled / fit2$sigma
>   fit2 <- eBayes(fit2)
>   topTable(fit2, coef=1, adjust="BH")
>
>   Ordinary t statistics appeared to be very different for different 
> runs, and only the first ordinary.t1 actually corresponds to the 
> ordinary t statictics that can be calculated in Excel:
>   > ordinary.t1[1:3,]
>   [1] -6.260828 -16.707178 -16.503064
>   > ordinary.t2[1:3,]
>   [1] -7.023479 -19.172993 -14.772837
>   > ordinary.t3[1:3,]
>   [1] -7.114086 -10.479982 -15.143921
>
>   eBayes statistics are even more different:
>   1. first 6 columns:
>   logFC t P.Value adj.P.Val B
>   3 -2.2243983 -9.922826 6.716099e-09 5.502173e-07 10.597497
>   2 -2.1486277 -9.618087 1.100435e-08 5.502173e-07 10.096915
>   1 -1.9143686 -7.434466 5.280488e-07 1.760163e-05 6.162366
>
>   2. first 9 columns:
>   logFC t P.Value adj.P.Val B
>   3 -2.2243983 -10.415452 1.066367e-09 5.478927e-08 12.339299
>   2 -2.1486277 -10.399332 1.095785e-09 5.478927e-08 12.311757
>   1 -1.9143686 -7.781305 1.382557e-07 4.608522e-06 7.403517
>
>   3. first 12 columns:
>   logFC t P.Value adj.P.Val B
>   3 -2.2243983 -9.423325 4.011632e-14 4.011632e-12 21.717523
>   2 -2.1486277 -8.919133 3.398312e-13 1.699156e-11 19.599063
>   1 -1.9143686 -7.721552 5.585622e-11 1.861874e-09 14.547154
>
>
>---------------------------------
>
>         [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111