[BioC] Hypergeometric test with Disease Ontology

Ted Morrow ted.morrow at ebc.uu.se
Fri Jan 28 15:57:55 CET 2011


Hi Gilbert,

Great thanks for the clarification....OK building myGA works fine and I 
have setAnnLib to org.Hs.eg.db with:
myGA2 <- setAnnLib(myGA, 'org.Hs.eg.db')
getAnnLib(myGA2)
[1] "org.Hs.eg.db"

.....but at the geneAnswerReadable step:
myGAreadable <- geneAnswersReadable(myGA2, catTerm=FALSE)

I get the following error:
[1] "Mapping geneInput ..."
[1] "Warning: some genes do not have valid symbols!"
[1] "Mapping genesInCategory ..."

  traceback()
2: stop(paste("The input geneAnswers categoryType is not DOLite but ",
        x at categoryType, ". stop function!"))
1: topDOLiteGenes(myGA, orderby = "pvalue", topGenes = "ALL", 
genesOrderBy = "pValue",
        file = TRUE)

Cheers
Ted

sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252 
LC_NUMERIC=C                           LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
  [1] org.Hs.eg.db_2.4.6   GeneAnswers_1.6.0    RColorBrewer_1.0-2   
Heatplus_1.20.0      rgl_0.92.798         MASS_7.3-9           
RSQLite_0.9-4        DBI_0.2-5            XML_3.2-0.2          
annotate_1.28.0      AnnotationDbi_1.12.0
[12] Biobase_2.10.0       RCurl_1.5-0.1        bitops_1.0-4.1       
igraph_0.5.5-1

loaded via a namespace (and not attached):
[1] graph_1.28.0     grid_2.12.1      RBGL_1.26.0      Rgraphviz_1.27.0 
tools_2.12.1     xtable_1.5-6
 >



On 1/28/2011 3:40 PM, Gilbert Feng wrote:
> Hi, Ted
>
> Yes, your codes about mygenepool and mygenes are correct. I didn't 
> clearly state them in my previous email. Sorry about that. myDOLite 
> should be built by DOLite and mygenepool, not mygenes. So currently, 
> myDOLite is a customized annotation library and when you build a 
> geneAnswers instance, leave categoryType to NULL(default value), and 
> in your case, I think the total gene number should be 1310, not 1202, 
> since the former is the gene number of mygenepool.
>
> after buiding myGA, run setAnnLib to set it to org.Hs.eg.db I guess 
> the error is that you don't set it.
>
> For the further mapping function, like geneAnswerReadable or other 
> visualization, always set catTerm=FALSE.
>
> Let me know if it doesn't work.
>
> Best
>
> Gilbert
>
>
>> I have two gene pools - one is a universe of genes (called say
>> "mygenepool") which is only a subset on all human genes, the other is a
>> subset of mygenepool (called say "mygenes"). I am interested in knowing
>> which DO terms show over-representation in mygenes given the DO terms
>> present in mygenepool. So, in your example I am a bit confused as to
>> which pool "mygenes" refers to. Is it "mygenepool" or "mygenes"?
>>
>> So I have......
>> str(mygenepool) # I used as.matrix then as.vector to change the
>> dataframe into a vector...is that OK?
>> int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ...
>>
>> and
>> str(mygenes)
>> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ...
>>
>>
>> and mygenes is a subset of mygenepool:
>> overlap <- intersect(mygenepool, mygenes)
>> str(overlap)
>> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ...
>>
>>
>> Running the first bit of your example code seems to work fine:
>> myDOLite <- lapply(DOLite, intersect, mygenepool)
>> names(myDOLite) <- DOLiteTerm[names(myDOLite)]
>>
>> and so does creating a GeneAnswers instance (myGA) whether I specify the
>> totalGeneNumber or not:
>> myGA <- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE',
>> testType='hyperG',
>> #known=FALSE, totalGeneNumber=1202,
>> pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE)
>>
>> But then I get the following error:
>> myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE)
>> [1] "Mapping geneInput ..."
>> Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair = 
>> "TAIR", :
>> EXPR must be a length 1 vector
>>
>> traceback()
>> 2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict,
>> missing = missing)
>> 1: geneAnswersReadable(myGA, verbose = TRUE)
>>
>> If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and not
>> myDOLite the step.....
>> myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE)
>>
>> .....works fine as do later steps (for example geneAnsersChartPlots and
>> topDOLiteGenes)
>>
>> But then presumably I am not working with the restricted pool of genes
>> that I want to (i.e. mygenepool)
>>
>> Any ideas?
>> Cheers
>> Ted
>>
>>
>>
>>
>>
>>
>> On 1/27/2011 5:19 PM, Gilbert Feng wrote:
>>> Hi, Ted
>>>
>>> If I correctly understand your question, you have your own gene pool,
>>> don't you? In that case, GeneAnswers still can handle that. What you
>>> need is to build a customized DOLite list to run analysis. For example,
>>>
>>> if your genes are mygenes(an unique Entrez GENE ID vector)
>>>
>>> myDOLite <- lapply(DOLite, intersect, mygenes)
>>> names(myDOLite) <- DOLiteTerm[names(myDOLite)]
>>> #Please note that the parameter 'known' is TRUE in geneAnswersBuilder,
>>> which means the test will only use shared genes in mygenes and DOLite
>>> genes as the gene pool to run hypergeometric test. You can change the
>>> number of gene pool by set parameter 'totalGeneNumber' and set 'known'
>>> to FALSE at the same time .
>>> myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...)
>>>
>>> since DO is designed for human, you can use setAnnLib to set the
>>> species as 'org.Hs.eg.db' for further visualization or analysis.
>>>
>>> Let me know if you have any question.
>>>
>>> Gilbert
>>>
>>> BTW, the current one has a new function to automatically generate a
>>> report for a or several groups of genes. Please check "groupReport"
>>> page, but before you run it, make sure you have java support and flash
>>> installation.
>>>
>>>
>>> On 1/27/11 9:45 AM, Ted Morrow wrote:
>>>> Thanks for your replies Gilbert and Peter.....I forgot to say in my
>>>> original question that I did look into using both the GeneAnswers
>>>> package and the FunDO web-based system but the issue I have not been
>>>> able to solve with them is that I cannot specify "my gene Universe"
>>>> within which my selected genes are tested for enrichment. I want to 
>>>> see
>>>> which diseases are enriched out of the possible diseases in "my gene
>>>> universe" which is a subset of all possible human genes.
>>>>
>>>> Is that possible in GeneAnswers (or even FunDO)?
>>>>
>>>> Cheers
>>>> Ted
>>>>
>>>> On 1/27/2011 4:39 PM, Peter Robinson wrote:
>>>>> On 01/27/2011 04:17 PM, Gilbert Feng wrote:
>>>>>> Hi, Ted
>>>>>>
>>>>>> DO.db is a standard sqlite file and you can use standard RSQLite
>>>>>> procedure to retrieve the information. Actually we do have a lite
>>>>>> version of Disease Ontology, DOLite that removes some redundant 
>>>>>> nodes,
>>>>>> integrated in GeneAnswers package, which also uses hypergeometric 
>>>>>> test
>>>>>> to run enrichment analysis as well as automatically generated
>>>>>> interactive(cytoscape web support) html summary for one or more 
>>>>>> groups
>>>>>> of genes.
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Gilbert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> You might also want to take a look at this website:
>>>>> http://django.nubic.northwestern.edu/fundo/faq
>>>>> which implements enrichment analysis using genes annotated to DO 
>>>>> terms.
>>>>> -peter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> On 1/27/11 5:00 AM, Ted Morrow wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I would like to conduct a hypergeometric test on a list of genes 
>>>>>>> but
>>>>>>> using Disease Ontology instead of GO or KEGG terms. The package
>>>>>>> "DO.db"
>>>>>>> contains this information but I have been unable to find a way of
>>>>>>> using
>>>>>>> this database in conjunction with the "GOstats" package that I have
>>>>>>> been
>>>>>>> using.
>>>>>>>
>>>>>>> Has anyone attempted to run a hypergeometric test for Disease
>>>>>>> Ontology
>>>>>>> terms? Is there another package I could use? Or is there a way of
>>>>>>> modifying the argument (if that's the right word) GOHyperGParams in
>>>>>>> GOstats so that it can make use of the information in DO.db?
>>>>>>>
>>>>>>> Both the GO and KEGG analyses work fine:
>>>>>>> ### GO chunk
>>>>>>> params<- new("GOHyperGParams",
>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>>>>> annotation = "hgu95av2.db", ontology="BP",
>>>>>>> pvalueCutoff=0.01, conditional=FALSE,
>>>>>>> testDirection="over")
>>>>>>> hgOver<- hyperGTest(params)
>>>>>>>
>>>>>>> hgOver
>>>>>>> Gene to GO BP test for over-representation
>>>>>>> 2136 GO BP ids tested (15 have p< 0.01)
>>>>>>> Selected gene set size: 112
>>>>>>> Gene universe size: 951
>>>>>>> Annotation package: hgu95av2
>>>>>>>
>>>>>>> ### KEGG chunk
>>>>>>> paramsKEGG<- new("KEGGHyperGParams",
>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>>>>> annotation = "hgu95av2.db",
>>>>>>> pvalueCutoff=0.01,
>>>>>>> testDirection="over")
>>>>>>>
>>>>>>> hgOverKEGG<- hyperGTest(paramsKEGG)
>>>>>>> hgOverKEGG
>>>>>>> summary(hgOverKEGG)
>>>>>>>
>>>>>>> My data looks like this:
>>>>>>> str(selectedEntrezIds)
>>>>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414"
>>>>>>> "7546"
>>>>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" 
>>>>>>> "113235"
>>>>>>> "28962" "10269" "4069" "30835" "7018" ...
>>>>>>>
>>>>>>> > str(entrezUniverse)
>>>>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165"
>>>>>>> "6521"
>>>>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842"
>>>>>>> "5048"
>>>>>>> "4843" "6329" "5080" "6401" "6853" ...
>>>>>>>
>>>>>>> My naive attempts to use DO have included:
>>>>>>> paramsDO<- new("DOHyperGParams",
>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>>>>> annotation = "DO.db",
>>>>>>> pvalueCutoff=0.01,
>>>>>>> testDirection="over")
>>>>>>>
>>>>>>> Which of course doesn't work and gives the following error:
>>>>>>> Error in getClass(Class, where = topenv(parent.frame())) :
>>>>>>> "DOHyperGParams" is not a defined class
>>>>>>>
>>>>>>> > traceback()
>>>>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain =
>>>>>>> NA)
>>>>>>> 2: getClass(Class, where = topenv(parent.frame()))
>>>>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds,
>>>>>>> universeGeneIds =
>>>>>>> entrezUniverse,
>>>>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over")
>>>>>>>
>>>>>>>
>>>>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the
>>>>>>> following error:..
>>>>>>>
>>>>>>> hgOverDO<- hyperGTest(paramsDO)
>>>>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) :
>>>>>>> 'arg' should be one of “BP”, “CC”, “MF”
>>>>>>>
>>>>>>> traceback()
>>>>>>> 10: stop(gettextf("'arg' should be one of %s", 
>>>>>>> paste(dQuote(choices),
>>>>>>> collapse = ", ")), domain = NA)
>>>>>>> 9: match.arg(ontology, c("BP", "CC", "MF"))
>>>>>>> 8: getUniverseViaGo(p)
>>>>>>> 7: universeBuilder(p)
>>>>>>> 6: universeBuilder(p)
>>>>>>> 5: .hyperGTestInternal(p)
>>>>>>> 4: is(object, Cl)
>>>>>>> 3: is(object, Cl)
>>>>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), 
>>>>>>> "HyperGResultBase",
>>>>>>> "hyperGTest")
>>>>>>> 1: hyperGTest(paramsDO)
>>>>>>>
>>>>>>>
>>>>>>> Any help would be greatly appreciated.
>>>>>>> /Ted
>>>>>>>
>>>>>>> > sessionInfo()
>>>>>>> R version 2.12.1 (2010-12-16)
>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>>
>>>>>>> locale:
>>>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>>>>> LC_TIME=English_United States.1252
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>>
>>>>>>> other attached packages:
>>>>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5
>>>>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5
>>>>>>> graph_1.28.0
>>>>>>> Category_2.16.0 AnnotationDbi_1.12.0
>>>>>>> [12] Biobase_2.10.0
>>>>>>>
>>>>>>> loaded via a namespace (and not attached):
>>>>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0
>>>>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 
>>>>>>> xtable_1.5-6
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
__________________________________________________________________________________

Edward H. Morrow
Department of Ecology and Evolution
Animal Ecology (zooekologi)
Evolutionary Biology Centre
Uppsala University
Norbyvägen 18-D
SE-752 36 Uppsala
SWEDEN

Email: ted.morrow at ebc.uu.se
Tel: +46 18 471 2676
Fax +46 18 471 6484
Webpage: http://www.iee.uu.se/zooekol/default.php?type=personalpage&id=119&lang=en



More information about the Bioconductor mailing list