[BioC] Bug in makeOrgPackageFromNCBI from AnnotationForge?

Blanchette, Marco MAB at stowers.org
Sat Aug 24 04:24:30 CEST 2013

I am working on a project involving Schizosaccharomyces pombe as a source for genomic analysis and love to use ReportingTools html producing wrappers. However, I am struggling as there is no AnnotationDbi package available for this organism. I decided to finally take the plunge and try to see if I could be one myself using AnnotationForge and was quite exciting to find that I could perhaps melt one simply by using the makeOrgPackageFromNCBI(). Most likely, something went wrong and I suspect a bug somewhere in the pipeline. I have not dug deeper then trying to build the package and use it hoping that someone closer to the code could shed some lights. Here the steps I took:'

> library(AnnotationForge)
> makeOrgPackageFromNCBI(version = "0.1",                                                                                                                                                                                                                  
                       author = "Marco Blanchette <mab at stowers.org>",                                                                                                                                                                                    
                       maintainer = "Marco Blanchette <mab at stowers.org>",                                                                                                                                                                                
                       outputDir = ".",                                                                                                                                                                                                                  
                       tax_id = "4896",                                                                                                                                                                                                                  
                       genus = "Schizosaccharomyces",                                                                                                                                                                                                    
                       species = "pombe")

This step succeeded with only a warning:

Warning message:
In .makeSimpleTable(ug, table = "unigene", con) :
  no values found for table unigene in this data chunk.

I didn't think this was critical enough to raise any red flag, so I then proceeded with the installation that went smoothly

> library(devtools)
> install('org.Spombe.eg.db')
> library('org.Spombe.eg.db')

Then I try to use it with ReportingTools publish() but fail as it returns an error related to Entrez ID which I had a conversion table from biomaRt. I dug a bit deeper and found that none of the genes I was querying were in the database to finally realize that there was only 38 entries int the org.Spombe.eg.db database I had just created and installed... Check this out:

> keytypes(org.Spombe.eg.db)
 [1] "ENTREZID" "ACCNUM"   "ALIAS"    "CHR"      "PMID"     "REFSEQ"  

Looking good! However:

> length(keys(org.Spombe.eg.db,'ENTREZID'))
[1] 38

Can someone close enough to the code shed some lights has to whether there is a bug in AnnotationForge or whether it is the NCBI database that is not conforming to what is expected? For instance, biomart has 5117 entrez ID

> library(biomaRt)
> mart <- useMart("fungi_mart_18","spombe_eg_gene")
> ensembl2entrez <- getBM(c('ensembl_gene_id','entrezgene'),mart=mart)
> sum(!is.na(ensembl2entrez$entrezgene))
[1] 5117

The ids I tested on the NCBI website return the correct genes. However, only 10 of the AnnotationForge EntrezID (out of a skirmish 38 ids) are found in biomaRt

> sum(keys(org.Spombe.eg.db,'ENTREZID') %in% ensembl2entrez$entrezgene)
[1] 10

Again, I would appreciate any comments or suggestions as to whether this is a bug or something I did wrong or a miss alignment between the NCBI S. pombe annotation and what is expected by AnnotationForge.

Marco Blanchette, Ph.D.
Assistant Investigator 
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071 
Cell: 816-726-8419 
Fax: 816-926-2018 

More information about the Bioconductor mailing list