[BioC] top table or anova?

Wed Jul 16 21:23:33 CEST 2014

Hi Konika Chawla,

Gordon Smyth has published multiple papers that outline why limma is 
superior to individual tests for low-replication microarray studies. You 
can find those references in the limma User's Guide (you can type 
limmaUsersGuide() at an R prompt after loading the limma package).

And as you note, this isn't a fair comparison. You don't say how many 
measurements you are making, but assuming somewhere around 25K genes, 
you expect 1250 significant results (unadjusted p-value < 0.05) under 
the null hypothesis. So having that many 'significant' genes isn't 
saying much. You could use p.adjust() on the vector of p-values to get 
BH adjusted p-values if you want to make the comparisons more reasonable.

Best,

Jim

On 7/16/2014 1:30 PM, Konika Chawla wrote:
> Hi
> I have a microarray data comprised of 3 treatment conditions (N,M,I) and
> 2 time points (4dpi and 5dpi).
> I used limma - ebayes fit and toptable with coefficient for a specific
> contrast to get the differentially expressed genes and adjusted P value.
> I also did a ANOVA test, and I want to know which one is better to
> identify differentially expressed genes.
>
> CODE: for eBAYES and Toptable
>
> design <- model.matrix(~ 0+factor(c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)))
> colnames(design) <- c("N_4dpi", "M_4dpi", "I_4dpi","N_5dpi", "M_5dpi",
> "I_5dpi")
> fit <- lmFit(eset, design)
> contrast.matrix <-
> makeContrasts("I_4dpi-M_4dpi","I_5dpi-M_5dpi","I_4dpi-N_4dpi","I_5dpi-N_5dpi","M_4dpi-N_4dpi","M_5dpi-N_5dpi",
> levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
> 2) #If we do one comparison at a time I/M 4dpi
> tab_all<-topTable(fit2, coef=1,adjust="BH",number=100,p.value=1)
> head(tab_all)
> write.table(tab_all,"I-M-4dpi.txt",sep="\t",quote=FALSE)
>
> This gives zero significant genes.
> ############################################################
>
> CODE for ANOVA
>
>
> targets <- readTargets("target_data.txt")
> data <- ReadAffy(filenames=targets$filename)
> experiment <- targets$experiment
> time <- targets$time
> treatment <- targets$treatment
>
>
> ##normalization
>
> eset <- rma(data)
> exprs.eset <- exprs(eset)
> exprs.eset.df <- data.frame(exprs.eset)
> aof <- function(x) {
>     m<-data.frame(time, treatment, x);
>     anova(aov(x ~ time + treatment + time * treatment, m))
> }
>
> anovaresults <- apply(exprs.eset, 1, aof)
>
> This gives  over a thousand genes with treatment_Pr.F <0.05 , Does this
> need to be corrected for multiple testing, why is the huge difference?
> Appreciate your help.
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099