[BioC] Tukey\'s HSD after ANOVA

Sandy [guest] guest at bioconductor.org
Mon May 6 08:54:54 CEST 2013


The dataset has 1000 genes and contains 24 samples with two mouse strains tested (129 and B6) and six brain regions. There are two replicates for each region.
The ANOVA was performed as follows:

      sdata<-read.table("http://www.chibi.ubc.ca/wp-content/uploads/2013/02/
                sandberg-sampledata.txt", header=T, row.names=1)
     strain <- gl(2,12,24, label=c("129","bl6"))
     region <- gl(6,2,24, label=c("ag", "cb", "cx", "ec", "hp", "mb"))
   # define ANOVA function
   aof <- function(x) { 
   m<-data.frame(strain,region, x); 
   anova(aov(x ~ strain + region + strain*region, m))
   }
 # apply analysis to the data and get the pvalues.
 anovaresults <- apply(sdata, 1, aof)

pvalues<-data.frame( lapply(anovaresults, function(x) { x["Pr(>F)"][1:3,] })    )

 # Get the genes with good region effect pvalues. 
 reg.hi.p <-t(data.frame(pvalues[2, pvalues[2,] < 0.0001 & pvalues[3,] > 0.1]))
 reg.hi.pdata <- sdata[ row.names(reg.hi.p), ]
A significant p-value resulting from a 1-way ANOVA test would indicate that a gene is differentially expressed in at least one of the groups analyzed. Now that there are more than two groups being analyzed, however, the 1-way ANOVA does not specifically indicate which pair of groups exhibits statistical differences. I know that Post Hoc tests can be applied in this specific situation to determine which specific pair/pairs are differentially expressed in each of the regions ( irrespective of the strains). I would like to know how to apply the Tukey's HSD using R in this case to find out which of these genes ( the ones with good region effect pvalues) are expressed in which region ( for instance in which brain region like "ag","cb","cx and so on).

 -- output of sessionInfo(): 

R version 2.15.2 (2012-10-26)
Platform: i686-redhat-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GOstats_2.24.0       RSQLite_0.10.0       DBI_0.2-5            graph_1.36.2         Category_2.22.0      AnnotationDbi_1.20.5 affy_1.36.1         
 [8] Biobase_2.16.0       BiocGenerics_0.4.0   R.utils_1.23.2       R.oo_1.13.0          R.methodsS3_1.4.2   

loaded via a namespace (and not attached):
 [1] affyio_1.22.0         annotate_1.36.0       AnnotationForge_1.0.3 BiocInstaller_1.8.3   genefilter_1.40.0     GO.db_2.8.0          
 [7] GSEABase_1.18.0       IRanges_1.16.6        parallel_2.15.2       preprocessCore_1.18.0 RBGL_1.34.0           splines_2.15.2       
[13] stats4_2.15.2         survival_2.36-14      tools_2.15.2          XML_3.9-4             xtable_1.6-0          zlibbioc_1.4.0      

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list