[BioC] GenomicFeatures Transcripts Retrieval Fails

James W. MacDonald jmacdon at uw.edu
Tue Jun 17 18:39:23 CEST 2014


Hi Sharvari,

On 6/17/2014 10:56 AM, Sharvari.Gujja at sanofi.com wrote:
> Hi Steve,
>
>
> I get the same error trying to run txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='knownGene')
>
> Error in function (type, msg, asError = TRUE)  : couldn't connect to host

This error means you are not able to connect to UCSC. This may be due to 
an intermittent outage on their end, or possibly because you are behind 
a firewall.

But note that if you want the knownGene transcript package, you can get 
that from Bioconductor without having to build it yourself:

library(BiocInstaller)
biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")

If you want the ensGene table you will have to build that one yourself. 
I just tried that using your code, and it works for me:

 > txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene')
Download the ensGene table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
Warning message:
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) :
   UCSC data anomaly in 19284 transcript(s): the cds cumulative length is
   not a multiple of 3 for transcripts ‘ENST00000513161’
   ‘ENST00000417833’ ‘ENST00000450884’ ‘ENST00000431193’
   ‘ENST00000367667’ ‘ENST00000498306’ ‘ENST00000434641’
   ‘ENST00000462097’ ‘ENST00000475119’ ‘ENST00000480643’
   ‘ENST00000525843’ ‘ENST00000498419’ ‘ENST00000532678’
   ‘ENST00000460428’ ‘ENST00000478853’ ‘ENST00000372925’
   ‘ENST00000437607’ ‘ENST00000416121’ ‘ENST00000582567’
   ‘ENST00000413489’ ‘ENST00000425265’ ‘ENST00000534717’
   ‘ENST00000436685’ ‘ENST00000606954’ ‘ENST00000484054’
   ‘ENST00000414971’ ‘ENST00000443667’ ‘ENST00000417191’
   ‘ENST00000559578’ ‘ENST00000482110’ ‘ENST00000524607’
   ‘ENST00000419169’ ‘ENST00000295713’ ‘ENST00000609181’
   ‘ENST00000327794’ ‘ENST00000450490’ ‘ENST00000602582’
   ‘ENST00000453676’ ‘ENST00000513088’ ‘ENST [... truncated]
 > txdb
TranscriptDb object:
| Db type: TranscriptDb
| Supporting package: GenomicFeatures
| Data source: UCSC
| Genome: hg19
| Organism: Homo sapiens
| UCSC Table: ensGene
| Resource URL: http://genome.ucsc.edu/
| Type of Gene ID: Ensembl gene ID
| Full dataset: yes
| miRBase build ID: NA
| transcript_nrow: 204940
| exon_nrow: 584914
| cds_nrow: 280379
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2014-06-17 09:34:13 -0700 (Tue, 17 Jun 2014)
| GenomicFeatures version at creation time: 1.16.2
| RSQLite version at creation time: 0.11.4
| DBSCHEMAVERSION: 1.0

So you might try again. If you are on Windows, you might be having a 
proxy issue, in which case you might use the setInternet2() function 
prior to running makeTranscriptDbFromUCSC().

Best,

Jim



>
>
>
> txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene')
>
> Error in function (type, msg, asError = TRUE)  : couldn't connect to host
>
> I did install the required packages, so not what I am missing here.
>
> source("http://bioconductor.org/biocLite.R")
> biocLite()
> biocLite(c("GenomicFeatures", "AnnotationDbi"))
> library("GenomicFeatures")
>
> Could you please help me with this error.
>
> Many Thanks
> Sharvari Gujja
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list