[BioC] Making use of the Apis mellifera assembly 2 data in goSeq

Tue Feb 28 21:58:29 CET 2012

Vanessa,

On 02/28/2012 11:17 AM, Corby, Vanessa wrote:
> Hi again Hervé,
>
> I hit "Reply All" this time.  Sorry about that.
>
> Here is my session info:

Thanks! More below...

>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] goseq_1.6.0            geneLenDataBase_0.99.8 BiasedUrn_1.04
>   [4] RCurl_1.91-1           bitops_1.0-4.1         GO.db_2.6.1
>   [7] RSQLite_0.11.1         DBI_0.2-5              AnnotationDbi_1.16.17
> [10] Biobase_2.14.0
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.10.0        Biostrings_2.22.0     BSgenome_1.22.0
>   [4] GenomicFeatures_1.6.8 GenomicRanges_1.6.7   grid_2.14.0
>   [7] IRanges_1.12.6        lattice_0.20-0        Matrix_1.0-1
> [10] mgcv_1.7-9            nlme_3.1-102          rtracklayer_1.14.4
> [13] tools_2.14.0          XML_3.9-4             zlibbioc_1.0.0
>
> I should have looked through these guidelines in more detail.  I could have given the output of the traceback().
>
> V
>
> -----Original Message-----
> From: Hervé Pagès [mailto:hpages at fhcrc.org]
> Sent: Tuesday, February 28, 2012 12:04 PM
> To: Corby, Vanessa; bioconductor at r-project.org
> Subject: Re: [BioC] Making use of the Apis mellifera assembly 2 data in goSeq
>
> Hi Vanessa,
>
> Please let's keep this discussion on the list (I should have said use the "Reply All" button instead of "Reply" in my previous email).
>
> On 02/28/2012 07:57 AM, Corby, Vanessa wrote:
>> Hello Hervé,
>>
>> So it looks like it made a .sqlite file called "org.Amellifera.eg.sqlite" in the specified directory, but I don't see an associated package.  I see in the AnnotationDbi information there is a wrapBaseDBPackages function.  Would this be the next step?
>
> More importantly, it seems that you got an error (see below), which would explain why the package is half-backed only.
>
>>
>> Here is my input into R and what I got out:
>>
>>> makeOrgPackageFromNCBI(version = "0.1", author = "Vanessa
>>> Corby-Harris<vanessa.corby at ars.usda.gov>", maintainer = "Vanessa
>>> Corby-Harris<vanessa.corby at ars.usda.gov>", outputDir =
>>> "/homeB/home4/u8/vlcorbyharris", tax_id = "7469", genus = "Apis",
>>> species = "mellifera")
>> Getting data for gene2pubmed.gz
>> Populating gene2pubmed table:
>> table gene2pubmed filled
>> Getting data for gene2accession.gz
>> Populating gene2accession table:
>> table gene2accession filled
>> Getting data for gene2refseq.gz
>> Populating gene2refseq table:
>> table gene2refseq filled
>> Getting data for gene2unigene
>> Populating gene2unigene table:
>> table gene2unigene filled
>> Getting data for gene_info.gz
>> Populating gene_info table:
>> table gene_info filled
>> Getting data for gene2go.gz
>> Populating gene2go table:
>> Getting blast2GO data as a substitute for gene2go Error in file(file,
>> "rt") : invalid 'description' argument In addition: Warning message:
>> In unzip(tmp) : error 1 in extracting from zip file

OK I can reproduce now. So it seems that when no GO data is available
for a tax id (which is the case for tax id 7469), then
makeOrgPackageFromNCBI() tries to use blast2GO data as a substitute.
Problem is blast2GO data is not available either for tax id 7469:

   > 
url.exists("http://bioinfo.cipf.es/b2gfar/_media/species:data:7469.annot.zip")
[1] FALSE

Hopefully a more competent person will be able to suggest a workaround.

In the mean time I've a small patch for makeOrgPackageFromNCBI() that
I'm about to apply and that makes it display a more useful error
message when the blast2GO data is not available.

Cheers,
H.

>
> I'm trying to reproduce this but makeOrgPackageFromNCBI() seems to take a long time (started this 10+ minutes ago). In the mean time could you please provide your sessionInfo()? This is an absolute requisite for us to troubleshoot. Also make sure you read our posting guide:
>
>     http://bioconductor.org/help/mailing-list/
>
> Thanks,
> H.
>
>>
>> Vanessa
>> -----Original Message-----
>> From: Hervé Pagès [mailto:hpages at fhcrc.org]
>> Sent: Monday, February 27, 2012 3:14 PM
>> To: Corby, Vanessa; bioconductor at r-project.org
>> Subject: Re: [BioC] Making use of the Apis mellifera assembly 2 data
>> in goSeq
>>
>> On 02/27/2012 02:02 PM, Corby, Vanessa wrote:
>>> Argh.  This is what I was afraid of...
>>
>> Why not try to make yours? See the makeOrgPackageFromNCBI() function
>> in
>> AnnotationDbi:
>>
>>      library(AnnotationDbi)
>>      ?makeOrgPackageFromNCBI
>>
>> (Would be nice if the goseq documentation could mention this.)
>>
>> If you run into problems while trying to use makeOrgPackageFromNCBI() to roll up your own org.Am.eg.db package, please come back with a precise description of the problem. Thanks!
>>
>> Cheers,
>> H.
>>
>>>
>>> Vanessa
>>>
>>> -----Original Message-----
>>> From: Hervé Pagès [mailto:hpages at fhcrc.org]
>>> Sent: Monday, February 27, 2012 2:51 PM
>>> To: Corby, Vanessa
>>> Cc: bioconductor at r-project.org
>>> Subject: Re: [BioC] Making use of the Apis mellifera assembly 2 data
>>> in goSeq
>>>
>>> Vanessa,
>>>
>>> On 02/27/2012 12:04 PM, Corby, Vanessa wrote:
>>>> Page 3 of the goseq documentation states
>>>>
>>>> "In order to link GO categories to genes, goseq uses the organism packages from Bioconductor.
>>>> These packages are named org.<Genome>.<ID>.db, where<Genome>     is a
>>>> short string identifying the genome and<ID>     is a short string
>>>> identifying the gene identi_er. Currently, goseq will automatically
>>>> retrieve the mapping between GO categories and genes from the relevant package (as long as it is installed) for commonly used genome/ID combinations."
>>>>
>>>> Let's say I want to use the organism package for the Apis mellifera version 2 genome, a supported genome according to the goseq supportedGenomes() command.  I see that there are several packages that follow the naming convention org.<Genome>.<ID>.db, but none of these is for the Apis version 2 genome.  What organism package do I install for Apis mellifera 2?  Can I use BSgenome.Amellifera.UCSC.apiMel2 even though it does not follow the proper naming scheme?
>>>
>>> The org.<Genome>.<ID>.db packages contain annotations (more precisely mappings between gene ids and other kinds of ids). The BSgenome.* packages (like Amellifera.UCSC.apiMel2) contain no annotations, only genomes (i.e. DNA sequences). Of course there is no way you can use one to replace an org.<Genome>.<ID>.db package.
>>>
>>> Cheers,
>>> H.
>>>
>>> PS: Please use the "Reply" button so emails are grouped by thread. This discussion is already splitted in 3 different threads on the list which is not going to help. Thanks!
>>>
>>>>
>>>> Thanks.
>>>>
>>>> Vanessa
>>>>
>>>> Research Molecular Biologist
>>>> USDA-ARS
>>>> Carl Hayden Bee Research Center
>>>> 2000 E. Allen Rd., Tucson, AZ  85719
>>>>
>>>> (520) 647-9269
>>>>
>>>> This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fhcrc.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319