[BioC] Rebuild GO.sqlite from GO.db using complete GO database

Eric Fournier Eric.Fournier at fsaa.ulaval.ca
Fri Jul 25 20:53:33 CEST 2014


Hi,

it seems I was wrong about almost everything. For the sake of anyone who stumbles on this thread with a similar problem, here's what I found out:

1) As Marc points out, the Entrez ID -> GO annotations are not part of GO.db, but of the individual org.Xx.eg.db.
2) The org.Xx.eg.db mappings DO include IEA annotations.
3) The dearth of annotated genes stem from the lackluster mappings provided by the NCBI Gene database (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ ). Only genes with GO annotations linked directly within their Gene entries are reported by org.Xx.eg.db. As such, rebuilding new libraries using the markOrgPackageFromNCBI function will not help.
4) A better, more complete species-specific mapping can be obtained directly from the Gene Onthology database: http://www.geneontology.org/page/download-annotations . However, this does not map to Entrez gene IDs, but mostly to UniProt and Ensembl IDs.

I am in the process of using those mappings to solve my issue.

Cheers,
-Eric

-----Message d'origine-----
De : bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] De la part de Marc Carlson
Envoyé : Tuesday, July 22, 2014 1:35 PM
À : bioconductor at r-project.org
Objet : Re: [BioC] Rebuild GO.sqlite from GO.db using complete GO database

Hi Eric,

At 1st your questions confused me because the GO.sqlite package does not actually record the information that you appear to think it does.  That is, the GO.sqlite db is really only for storing information about the GO hierarchy itself.  It does not actually know anything about which terms are associated with which genes (or whether the association was IEA or something else).  That kind of information (gene to GO term
associations) is actually stored in the database from your 'org.Bt.eg.db package'.

And you have several options for rebuilding that DB if you need a different one.  The most 'hands on' way to rebuild it, is to use the
makeOrgPackage() function from the AnnotationForge package. That function will allow you to make an organism package (with populated database etc.) from a set of data.frames objects.  Using that, you could easily supply your own preferred GO information for Bovine and be as liberal as you feel is appropriate.

Hope that clarifies things, please let me know if you have more questions!


  Marc



On 07/22/2014 08:31 AM, Eric Fournier wrote:
> Hello,
>
> I am performing GO term enrichment analysis in my organism of interest (bos taurus) using the org.Bt.eg.db and GO.db package. However, since the GO.db package uses the "lite" version of the Gene Onthology database, all IEA (Inferred from Electronic Annotation) terms are absent. In cattle, this makes the annotation pretty barren (Over 60% of my genes have no GO annotation at all). Therefore, I am looking for ways to rebuild the GO.sqlite file used by GO.db using the full GO database. However I cannot find any indication on how to do so, either from the package source (where the file is already packaged) or from its manual. Could anyone point me in the right direction?
>
> Thank you,
> ________________________________________________________
> Eric Fournier, B. Sc.
> Research Assistant in Bioinformatics
> Université Laval, Qc, Canada
> eric.fournier.4 at ulaval.ca
> 418-656-2131 x 11465
>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list