[BioC] reasonable Illumina hyperG test

James W. MacDonald jmacdon at med.umich.edu
Fri Sep 5 14:48:06 CEST 2008


Hi Sebastien,

Sebastien Gerega wrote:
> Hi,
> I have been looking around at examples of the hyperGTest (in the 
> GOstats, lumi, and other documentation) and feel like I have seen many 
> slight variations on the methodology.
> These variations are usually found in the way the non-specific filtering 
> is performed. I haven't come across many examples of a hyperGTest for 
> KEGG pathways and would like to ask whether my approach seems reasonable 
> or whether I should make any changes.
> Here is my code ("sig" is a vector of EntrezID):
> 
> uni = exprs(lumi.N.P)
> 
> #Remove those without PATH annotation
> havePATH = sapply(mget(allFeatures, lumiHumanAllPATH),
> function(x){
>    if (length(x) == 1 && is.na(x))
>    FALSE
>    else TRUE
> })
> uni <- uni[names(which(havePATH == TRUE)),]
> 
> #Remove those with little variation accross samples
> iqrCutoff = 0.5
> uni.IQR = apply(uni, 1, IQR)
> uni = uni[which((uni.IQR > iqrCutoff) == TRUE),]
> 
> #Keep probes w/largest IQR
> uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], 
> "lumiHumanAll"),]
> uni = mget(rownames(uni), lumiHumanAllENTREZID)

This may have by chance removed all duplicate Entrez IDs, but maybe not. 
You should also ensure that you have unique Entrez Gene IDs, as 
duplicates will bias your results (although I believe duplicates will be 
stripped anyway).

> 
> params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, 
> annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over")
> 
> hgOver = hyperGTest(params)
> 
> 
> Does this code/approach seem reasonable? Should I correct for multiple 
> testing after the hyperGTest?

How to correct for multiple testing with such highly dependent data is 
not really clear, and is probably not necessary, especially with KEGG 
data. You will likely only have a few significant terms, and it is even 
less likely that they will all be interesting to you or your collaborators.

> Would it be fair to perform a test on gene ontologies in teh same way 
> (obviously after having changed the param type and specifying an 
> ontology branch)?

Yes, with the addition of removing duplicate Entrez Gene IDs.

Best,

Jim


> 
> thanks,
> Sebastien
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662



More information about the Bioconductor mailing list