[BioC] Problems in retrieving 3'UTR sequences of ALL human genes using biomaRt

Hervé Pagès hpages at fhcrc.org
Mon Jul 23 20:13:12 CEST 2012


Hi Karthik,

Alternatively:

   library(GenomicFeatures)
   txdb <- makeTranscriptDbFromBiomart("ensembl", "hsapiens_gene_ensembl")
   three_utrs <- threeUTRsByTranscript(txdb, use.names=TRUE)

   library(BSgenome.Hsapiens.UCSC.hg19)
   ## Some gymnastic in order to deal with different chromosome naming
   ## conventions between Ensembl and UCSC. We only keep the 25 main
   ## chromosomes (1-22, X, Y, M).
   seqlevels(three_utrs, force=TRUE) <- seqlevels(three_utrs)[1:25]
   seqlevels(three_utrs) <- seqlevels(Hsapiens)[1:25]

   three_utr_seqs <- extractTranscriptsFromGenome(Hsapiens, three_utrs, 
use.names=TRUE)
   > head(three_utr_seqs)
     A DNAStringSet instance of length 6
       width seq                                               names 

   [1]    37 ATGATATAATAAGCCCTTCTCATTAAACATGATATGG 
ENST00000426406
   [2]   601 TGTAGTCTGATGTAGTCTCATGT...TATTGCTTTGGATAGTATGGATG 
ENST00000358533
   [3]   422 GGTTGCCGGGGGTAGGGGTGGGG...GAAAAATAAATAATAAAGCCTGT 
ENST00000342066
   [4]   422 GGTTGCCGGGGGTAGGGGTGGGG...GAAAAATAAATAATAAAGCCTGT 
ENST00000341065
   [5]   106 GGTTGCCGGGGGTAGGGGTGGGG...TCTTTCGGTTTCGGATGCAAAAC 
ENST00000455979
   [6]   523 CCCACCTACCACCAGAGGCCTGC...TTAATAAACACATTTCTGGGGTT 
ENST00000455747

HTH,
H.


On 07/23/2012 06:56 AM, Karthik K N wrote:
> Hello Tim,
>
> I tried your first suggestion (for chromosome 1). I got an error message as
> shown below:
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>   :
>    line 2134 did not have 2 elements
>
> Did you see something like this at your end?
>
> Thanks,
>
> Karthik
>
>
> On Mon, Jul 23, 2012 at 5:24 PM, Tim Smith <tim_smith_666 at yahoo.com> wrote:
>
>> Try:
>>
>> chrom <- c(1:22,'X','Y')
>>
>>    ------------------------------
>> *From:* Karthik K N <karthikuttan at gmail.com>
>> *To:* Tim Smith <tim_smith_666 at yahoo.com>
>> *Cc:* "bioconductor at r-project.org" <bioconductor at r-project.org>
>> *Sent:* Monday, July 23, 2012 7:46 AM
>> *Subject:* Re: [BioC] Problems in retrieving 3'UTR sequences of ALL human
>> genes using biomaRt
>>
>>   Hello Tim,
>>
>> Thanks for the reply. I think this will give the 3'UTRs of all the genes
>> in chromosome 1. How can I get this for ALL the genes in ALL the chromosome
>> instead of repeating the step by changing the chromosome number?
>>
>> Thanks a lot once again.
>>
>> On Mon, Jul 23, 2012 at 5:04 PM, Tim Smith <tim_smith_666 at yahoo.com>wrote:
>>
>> Seems to work with:
>>
>> chrom <- '1'
>> xx <- getBM(attributes=c("hgnc_symbol", "3utr"),filters="chromosome_name",
>>
>>          values = chrom, mart = ensembl)
>>
>>    ------------------------------
>> *From:* Karthik K N <karthikuttan at gmail.com>
>> *To:* bioconductor at r-project.org
>> *Sent:* Monday, July 23, 2012 5:02 AM
>> *Subject:* [BioC] Problems in retrieving 3'UTR sequences of ALL human
>> genes using biomaRt
>>
>> Dear Members,
>>
>> I am trying to download the 3'UTR sequences of all human genes from Ensembl
>> Biomart using the package biomaRt. Ideally, after retrieving I want to save
>> these in FASTA format. When I am using the code given below to get 3'UTRs
>> of genes in chromosome 1, 2 and 3 (not sure if this is the best way to
>> achieve what I want), I am getting an error:
>>
>> "Error in getBM(attributes = c("hgnc_symbol", "3utr"), filters =
>> "chromosome_name",  :
>>    Query ERROR: caught BioMart::Exception::Database: Could not connect to
>> mysql database ensembl_mart_67a: DBI
>> connect('database=ensembl_mart_67a;host=bmdccdb.oicr.on.ca
>> ;port=3306','bm_web',...)
>> failed: Too many connections at /srv/biomart_server/
>> biomart.org/biomart-perl/lib/BioMart/Configuration/DBLocation.pm line 98"
>>
>> Code is given below:
>>
>>> library(biomaRt)
>>> ensembl=useMart("ensembl")
>>> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
>>> chrom=c(1,2,3)
>>> getBM(attributes=c("hgnc_symbol", "3utr"),filters="chromosome_name",
>> values = chrom, mart = ensembl)
>> Error in getBM(attributes = c("hgnc_symbol", "3utr"), filters =
>> "chromosome_name",  :
>>    Query ERROR: caught BioMart::Exception::Database: Could not connect to
>> mysql database ensembl_mart_67a: DBI
>> connect('database=ensembl_mart_67a;host=bmdccdb.oicr.on.ca
>> ;port=3306','bm_web',...)
>> failed: Too many connections at /srv/biomart_server/
>> biomart.org/biomart-perl/lib/BioMart/Configuration/DBLocation.pm line 98
>>
>> *SessionInfo is given below:*
>> *
>>
>> *
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
>> LC_MONETARY=English_India.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_India.1252
>>
>> attached base packages:
>> [1] stats    graphics  grDevices utils    datasets  methods  base
>>
>> other attached packages:
>> [1] biomaRt_2.12.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.91-1.1 XML_3.9-1.1
>>
>>
>> Can somebody please tell me where I am going wrong?
>>
>> Thanks a lot,
>>
>> Regards,
>>
>> Kart
>>
>>      [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>
>> -
>>
>>
>>
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list