[BioC] Ensembl mouse proteins

Vincent Carey stvjc at channing.harvard.edu
Mon May 23 12:19:41 CEST 2011


Please keep dialogue on the list so others may learn.  See below.

On Sun, May 22, 2011 at 8:58 PM, Stefanie Gerstberger
<stefanie.gerstberger at ymail.com> wrote:
> Hi Vincent,
> thanks for your reply. I had problems with biomaRt :
>> sessionInfo()
> R version 2.12.1 (2010-12-16)

This is out of date.  External services can't be used reliably with
old versions of R.  More below

> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
> [1] biomaRt_2.6.0     Biostrings_2.18.2 IRanges_1.8.8
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 RCurl_1.4-3    tools_2.12.1   XML_3.2-0
>>
>> library(biomaRt)
>> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>> protein = getSequence(id = "ENSG00000089280", type = "ensembl_gene_id",
>> seqType = "peptide", mart = ensembl)
> Error in getBM(c(seqType, type), filters = type, values = id, mart = mart,
>  :
>   Query ERROR: caught BioMart::Exception::Database: Could not connect to
> mysql database ensembl_mart_62: DBI
> connect('database=ensembl_mart_62;host=dcc-qa-db.oicr.on.ca;port=3306','bm_web',...)
> failed: Can't connect to MySQL server on 'dcc-qa-db.oicr.on.ca' (113) at
> /srv/biomart_server/biomart.org/biomart-perl/lib/BioMart/Configuration/DBLocation.pm
> line 98

I was unable to reproduce this error with a properly update version of
R/biomaRt.  See further below

>> protein = getSequence(id = c(100, 5728), type = "entrezgene", seqType =
>> "peptide", mart = ensembl)
> Error in getBM(c(seqType, type), filters = type, values = id, mart = mart,
>  :
>   Query ERROR: caught BioMart::Exception::Database: Could not connect to
> mysql database ensembl_mart_62: DBI
> connect('database=ensembl_mart_62;host=dcc-qa-db.oicr.on.ca;port=3306','bm_web',...)
> failed: Can't connect to MySQL server on 'dcc-qa-db.oicr.on.ca' (113) at
> /srv/biomart_server/biomart.org/biomart-perl/lib/BioMart/Configuration/DBLocation.pm
> line 98
>>
> that's I guess an internal ensembl problem.
> However, I tried to circumvene this problem by just manually downloading the
> mouse sequences at ensembl biomart server - I found that the files only
> contained 4600 cDNA sequences or if downloading the peptide sequences I only

I don't know what to say about this.  However

> mens = useMart("ensembl", dataset = "mmusculus_gene_ensembl")

> p2 = getSequence(id = c(100, 5728), type = "entrezgene", seqType = "peptide", mart = mens)
> dim(p2)
[1] 0 2
> protein = getSequence(id = "ENSMUSG00000057573", type = "ensembl_gene_id", seqType = "peptide", mart = mens)
> dim(protein)
[1] 1 2
> protein = getSequence(id = "ENSMUSG00000066372", type = "ensembl_gene_id", seqType = "peptide", mart = mens)
> dim(protein)
[1] 1 2
> sessionInfo()
R version 2.13.0 Patched (2011-04-14 r55443)
Platform: x86_64-apple-darwin10.6.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
[1] biomaRt_2.8.0   weaver_1.17.0   codetools_0.2-8 digest_0.4.2

loaded via a namespace (and not attached):
[1] RCurl_1.5-0 XML_3.2-0



> received 2300 sequences. I translated the 4600 sequences using Biostrings
> but quite a bit of sequences contain undefined nucleotides and no ATG start
> codon or are ending in frameshift. But I'm very confused about receiving
> only 4600 cDNA sequences.
> I know this part is not really for the Bioconductor list but I was hoping
> that someone with experience with the ensembl mouse genome knows why I'm
> encountering this - and whether there is a way in Bioconductor to download
> the sequences not using Biomart. I have found a way now around it  - by
> simply ignoring ensembl and using refseq proteins downloaded from UCSC.
> Using BiomaRt in R seemed to me the simplest solution to obtain the
> sequences - I don't currently know any other option.
> Thanks,
> Stefanie
>
>
>
>
>
>
> ________________________________
> Von: Vincent Carey <stvjc at channing.harvard.edu>
> An: Stefanie Carola Gerstberger <scg74 at cornell.edu>
> CC: "Bioconductor at r-project.org" <Bioconductor at r-project.org>
> Gesendet: Sonntag, den 22. Mai 2011, 19:29:08 Uhr
> Betreff: Re: [BioC] Ensembl mouse proteins
>
> What is the relationship of your question to bioconductor?  Are you
> using R to perform the download?  What functions in what packages,
> with
> what version?  Read the posting guide, please, and provide result of
> sessionInfo().
>
> On Sun, May 22, 2011 at 6:12 PM, Stefanie Carola Gerstberger
> <scg74 at cornell.edu> wrote:
>> Hi,
>> I have tried to download the mouse protein sequences from Biomart Ensembl.
>>  I only received 2203 protein sequences for mouse, including isoforms. The
>> same results from downloading the Ensembl protein sequences through UCSC
>> genome browser.I also encounter the problem for Xenopus tropicalis - only
>> 4700 protein sequences. As reference point S.cerevisae has  6700 sequences
>> in Ensembl biomart, human 87,000, Drosophila 22,000. Does anyone know why
>> this is and how I can circumvene this problem to get a complete list of
>> protein sequences for mouse and Xenopus?
>> Thanks,
>> Stefanie
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>



More information about the Bioconductor mailing list