[BioC] problem with makeOrgPackageFromNCBI (for Chinese hamster)

Marc Carlson mcarlson at fhcrc.org
Fri Aug 23 02:32:06 CEST 2013


Hi Guido,

I have (so far) been unable to reproduce your initial issue here. I have 
no issues generating this package with either release or devel.  But 
even though I can't use your package directly myself, I am almost 
certain that your package is actually just fine, and that the only 
reason is says FALSE is because of the 2nd warning given (R will say 
FALSE when you call file.remove and it can't actually remove 
something).  Now the 1st warning just means that you don't have any 
unigene data (and that's actually good in this case, since there are no 
unigenes for this critter).  While the 2nd warning has to do with R 
feeling it is not allowed to remove the generated .sqlite file after 
copying it into the new package directory.  I don't know why that 2nd 
warning is happening on Windows and I plan to investigate it, but the 
crucial thing is that this happens AFTER it has already generated the 
package.

Looking down a bit farther you did find a problem with the 
org.Cgriseus.eg() function.  Now I think that is a real bug (not a 
serious one, but one I intend to look into shortly), with the 
org.Cgriseus.eg() function.  Basically your package does not have (and 
should not have) a org.Cgriseus.egREFSEQ2EG mapping, and yet this silly 
function is trying to ask about it.  But that is not actually a problem 
that exists within your package since the offending code for that 
actually lives in AnnotationDbi.

Now you're correct that your package does have the data that could be 
used for the org.Cgriseus.egREFSEQ2EG mapping, and that this data is 
exposed via the select method().  It is also available via the 
org.Cgriseus.egREFSEQ mapping.  But it is still not supposed to have 
that specific reverse mapping (and it also does not need it since you 
have a revmap() method).  In fact, none of the old mappings are really 
needed for anything.  We just generated a few of them for the purposes 
of maintaining some backwards compatibility.  And to answer your other 
question the package is actually "made" by just putting the database 
into the inst/exdata of a very minimalist package template found in 
AnnotationForge (you can look at in in 
inst/AnnDbPkg-templates/ORGANISM.DB/ if you want to see it).  The 
template is altered slightly based on some inputs that are generated 
from your initial arguments so that the manual pages etc. are all 
matched to the source material.  So really, the most complicated thing 
that happens (after the database is made) is actually just generating 
all the manual pages.

If you could send me a tarball for the package that you generated, I 
would like to look at it and verify that there are not any peculiarities 
with it compared to the one that I made here.


   Marc



On 08/22/2013 12:33 PM, Hooiveld, Guido wrote:
> Hi Marc and others,
>
> I am using makeOrgPackageFromNCBI() to create an annotation package for Chinese hamster (Cricetulus griseus), but experience some problems during this process. Please see code below for details. It could be very well that I miss something obvious, so any suggestion what may cause this would be appreciated!
>
> Thanks,
> Guido
>
>
> 1) I am using R on Win7, have admin rights, and also start R through 'Run as administrator'. Why can the file 'org.Cgriseus.eg.sqlite' then not be removed? (Reason 'Permission denied'). Note: I understand this is just a warning but it may be relevant.
>
> 2a) Despite no *.db package was produced, I still tried to install the database from the directory the files were generated (i.e. D:\\org.Cgriseus.eg.db). This *seemed* to go OK, but when I check they number of mapped egids it failed at the org.Cgriseus.egREFSEQ mapping...
> 2b) Interestingly, when I manually load the sqlite database (that could not be removed) these org.Cgriseus.egREFSEQ mappings are present! See code at bottom.
> 2c) --> How to make a *.db from an *.sqlite?
>
>
> # Create db0 for Chinese hamster using makeOrgPackageFromNCBI()
>> library(AnnotationForge)
>> makeOrgPackageFromNCBI(
> +           version="0.1",
> +           maintainer="Guido Hooiveld <guido.hooiveld at wur.nl>",
> +           author="Guido Hooiveld <guido.hooiveld at wur.nl>",
> +           outputDir=".",
> +           tax_id=10029,
> +           genus="Cricetulus",
> +           species="griseus")
> Loading required package: GO.db
>
> Getting data for gene2pubmed.gz
> Loading required package: RCurl
> Loading required package: bitops
> discarding data from other organisms
> Populating gene2pubmed table:
> table gene2pubmed filled
> Getting data for gene2accession.gz
> discarding data from other organisms
> Populating gene2accession table:
> table gene2accession filled
> Getting data for gene2refseq.gz
> discarding data from other organisms
> Populating gene2refseq table:
> table gene2refseq filled
> Getting data for gene2unigene
> discarding data from other organisms
> Populating gene2unigene table:
> table gene2unigene filled
> Getting data for gene_info.gz
> discarding data from other organisms
> Populating gene_info table:
> table gene_info filled
> Getting data for gene2go.gz
> discarding data from other organisms
> Populating gene2go table:
> Getting blast2GO data as a substitute for gene2go
> table metadata filled
> table map_metadata filled
> table gene2go filled
> table metadata filled
> table map_metadata filled
> Populating genes table:
> genes table filled
> Populating gene_info_temp table:
> gene_info_temp table filled
> Populating alias table:
> alias table filled
> Populating chromosomes table:
> chromosomes table filled
> Populating pubmed table:
> pubmed table filled
> Populating refseq table:
> refseq table filled
> Populating accessions table:
> accessions table filled
> Populating unigene table:
> Dropping GO IDs that are too new for the current GO.db
> Dropping GO IDs that are too new for the current GO.db
> Dropping GO IDs that are too new for the current GO.db
> Populating go_bp table:
> go_bp table filled
> Populating go_mf table:
> go_mf table filled
> Populating go_cc table:
> go_cc table filled
> Populating go_bp_all table:
> go_bp_all table filled
> Populating go_mf_all table:
> go_mf_all table filled
> Populating go_cc_all table:
> go_cc_all table filled
> dropping table gene2pubmeddropping table gene2accessiondropping table gene2refseqdropping table gene2unigenedropping table gene_infodropping table gene2go
> Making GO views
>
>
> SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL
> SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL
> SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL
> table map_counts filled
> Creating package in ./org.Cgriseus.eg.db
> [1] FALSE
> Warning messages:
> 1: In .makeSimpleTable(ug, table = "unigene", con) :
>    no values found for table unigene in this data chunk.
> 2: In file.remove(dbfile) :
>    cannot remove file 'org.Cgriseus.eg.sqlite', reason 'Permission denied'
>> # Now manually install files from DIR that has been generated.
>>
>> install.packages(repos=NULL, pkgs="D:\\org.Cgriseus.eg.db", type="source")
> * installing *source* package 'org.Cgriseus.eg.db' ...
> ** R
> ** inst
> ** preparing package for lazy loading
> ** help
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded
> *** arch - i386
> *** arch - x64
> * DONE (org.Cgriseus.eg.db)
>> library(org.Cgriseus.eg.db)
>> org.Cgriseus.eg()
> Quality control information for org.Cgriseus.eg:
>
>
> This package has the following mappings:
>
> org.Cgriseus.egALIAS2EG has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egCHR has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egGENENAME has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egGO has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egGO2ALLEGS has 25227 mapped keys (of 16020 keys)
> org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys)
> org.Cgriseus.egREFSEQ has 25227 mapped keys (of 25227 keys)
> Error in get(mapname) : object 'org.Cgriseus.egREFSEQ2EG' not found
>>
>
>
>> #load sqlite to check that REFSEQ mappings are included
>> CHO.db <- loadDb("org.Cgriseus.eg.sqlite")
>> CHO.db
> OrgDb object:
> | BL2GOSOURCEDATE: Thu Aug 22 18:47:20 2013
> | BL2GOSOURCENAME: blast2GO
> | BL2GOSOURCEURL: http://www.blast2go.de/
> | DBSCHEMAVERSION: 2.1
> | DBSCHEMA: ORGANISM_DB
> | ORGANISM: Cricetulus griseus
> | SPECIES: Cricetulus griseus
> | CENTRALID: EG
> | TAXID: 10029
> | EGSOURCEDATE: Thu Aug 22 18:47:24 2013
> | EGSOURCENAME: Entrez Gene
> | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | GOSOURCEDATE: 20130302
> | GOSOURCENAME: Gene Ontology
> | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata
> | GOEGSOURCEDATE: Thu Aug 22 18:47:24 2013
> | GOEGSOURCENAME: Entrez Gene
> | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | Db type: OrgDb
> | Supporting package: AnnotationDbi
>
>> cols(CHO.db)
> [1] "ENTREZID" "ACCNUM"   "ALIAS"    "CHR"      "PMID"     "REFSEQ"
>   [7] "SYMBOL"   "UNIGENE"  "GENENAME" "GO"       "EVIDENCE" "ONTOLOGY"
>> keys <- head( keys(CHO.db))
>> keys
> [1] "100682525" "100682526" "100682527" "100682528" "100682529" "100682530"
>> select(CHO.db, keys=keys, cols = c("SYMBOL","REFSEQ","UNIGENE"))
>      ENTREZID SYMBOL       REFSEQ UNIGENE
> 1  100682525    P53 NM_001243976    <NA>
> 2  100682525    P53 NP_001230905    <NA>
> 3  100682526 Tuba1c NM_001243977    <NA>
> 4  100682526 Tuba1c NP_001230906    <NA>
> 5  100682527 Tuba1a NM_001243978    <NA>
> 6  100682527 Tuba1a NP_001230907    <NA>
> 7  100682528 Tuba1b NM_001243979    <NA>
> 8  100682528 Tuba1b NP_001230908    <NA>
> 9  100682529  Mgat1 NM_001243980    <NA>
> 10 100682529  Mgat1 NP_001230909    <NA>
> 11 100682530   Plec XM_003507629    <NA>
> 12 100682530   Plec XP_003507677    <NA>
> Warning message:
> In .generateExtraRows(tab, keys, jointype) :
>    'select' resulted in 1:many mapping between keys and return rows
>> sessionInfo()
> R version 3.0.1 Patched (2013-06-05 r62877)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] org.Cgriseus.eg.db_0.1 RCurl_1.95-4.1         bitops_1.0-6           GO.db_2.9.0
>   [5] AnnotationForge_1.2.2  org.Hs.eg.db_2.9.0     RSQLite_0.11.4         DBI_0.2-7
>   [9] AnnotationDbi_1.22.6   Biobase_2.20.1         BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.18.3 stats4_3.0.1   tools_3.0.1
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list