[BioC] GenomicFeatures Transcripts Retrieval Fails
James W. MacDonald
jmacdon at uw.edu
Tue Jun 17 19:50:59 CEST 2014
Hi Sharvari,
On 6/17/2014 1:04 PM, sharvari gujja wrote:
> Hi Jim,
>
> Thanks for the reply. Yes, I am running this on Windows. I followed your
> suggestion to use setInternet2() function first, but I still get an error:
>
> > setInternet2()
> > txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene')
> Error in function (type, msg, asError = TRUE) : couldn't connect to host
You might consult one of your IT people about that, but otherwise I have
nothing more for you.
Well, except you could use the makeTranscriptDbFromGFF() function if you
are really a fan of ensGene. You can go to UCSC using a browser, then
click on the 'Tables' menu item. There you could choose
group: Genes and Gene Predictions
track: Ensembl Genes
table: ensGene
output format: GTF - gene transfer format
Set the output file to be something reasonable, then click 'get output'.
You can then use that to create a TxDb.
But unless you are dead set on using Ensembl genes, it's probably not
worth the bother.
>
> I also tried:
>
> > biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
> BioC_mirror: http://bioconductor.org
> Using Bioconductor version 2.14 (BiocInstaller 1.14.2), R version 3.1.0.
> Installing package(s) 'TxDb.Hsapiens.UCSC.hg19.knownGene'
> trying URL
> 'http://bioconductor.org/packages/2.14/data/annotation/bin/windows/contrib/3.1/TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0.zip'
> Content type 'application/zip' length 18546564 bytes (17.7 Mb)
> opened URL
> downloaded 17.7 Mb
>
> How do I read this table "TxDb.Hsapiens.UCSC.hg19.knownGene"? Also, is
> there documentation on the differences between "knownGene" and "ensGene"?
You want to do some reading:
http://bioconductor.org/packages/release/bioc/vignettes/GenomicFeatures/inst/doc/GenomicFeatures.pdf
There are literally a bazillion things you can do with a TxDb object, so
unless you have a use case that you want to talk about, you will have to
do some self-learning (which you should be doing anyway, so there you go).
As far as documentation, you can start with UCSC's table page. If you do
as I describe above, and then click on the 'describe table schema'
button you get a page that says that the genes and gene predictions come
from Ensembl, and they have a link to ensembl.org, where you can do more
reading.
For the knownGene table, if you change to
track: UCSC Genes
table: knownGene
and then click 'describe table schema' there is a whole webpage
describing how they generate those data.
Best,
Jim
>
> Thanks for helping.
> Sharvari
>
>
>
>
> On Tue, Jun 17, 2014 at 12:39 PM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi Sharvari,
>
>
> On 6/17/2014 10:56 AM, Sharvari.Gujja at sanofi.com
> <mailto:Sharvari.Gujja at sanofi.com> wrote:
>
> Hi Steve,
>
>
> I get the same error trying to run txdb <-
> makeTranscriptDbFromUCSC(__genome='hg19',tablename='__knownGene')
>
> Error in function (type, msg, asError = TRUE) : couldn't
> connect to host
>
>
> This error means you are not able to connect to UCSC. This may be
> due to an intermittent outage on their end, or possibly because you
> are behind a firewall.
>
> But note that if you want the knownGene transcript package, you can
> get that from Bioconductor without having to build it yourself:
>
> library(BiocInstaller)
> biocLite("TxDb.Hsapiens.UCSC.__hg19.knownGene")
>
> If you want the ensGene table you will have to build that one
> yourself. I just tried that using your code, and it works for me:
>
>
> > txdb <-
> makeTranscriptDbFromUCSC(__genome='hg19',tablename='__ensGene')
> Download the ensGene table ... OK
> Extract the 'transcripts' data frame ... OK
> Extract the 'splicings' data frame ... OK
> Download and preprocess the 'chrominfo' data frame ... OK
> Prepare the 'metadata' data frame ... OK
> Make the TranscriptDb object ... OK
> Warning message:
> In .__extractCdsLocsFromUCSCTxTable(__ucsc_txtable, exon_locs) :
> UCSC data anomaly in 19284 transcript(s): the cds cumulative
> length is
> not a multiple of 3 for transcripts ‘ENST00000513161’
> ‘ENST00000417833’ ‘ENST00000450884’ ‘ENST00000431193’
> ‘ENST00000367667’ ‘ENST00000498306’ ‘ENST00000434641’
> ‘ENST00000462097’ ‘ENST00000475119’ ‘ENST00000480643’
> ‘ENST00000525843’ ‘ENST00000498419’ ‘ENST00000532678’
> ‘ENST00000460428’ ‘ENST00000478853’ ‘ENST00000372925’
> ‘ENST00000437607’ ‘ENST00000416121’ ‘ENST00000582567’
> ‘ENST00000413489’ ‘ENST00000425265’ ‘ENST00000534717’
> ‘ENST00000436685’ ‘ENST00000606954’ ‘ENST00000484054’
> ‘ENST00000414971’ ‘ENST00000443667’ ‘ENST00000417191’
> ‘ENST00000559578’ ‘ENST00000482110’ ‘ENST00000524607’
> ‘ENST00000419169’ ‘ENST00000295713’ ‘ENST00000609181’
> ‘ENST00000327794’ ‘ENST00000450490’ ‘ENST00000602582’
> ‘ENST00000453676’ ‘ENST00000513088’ ‘ENST [... truncated]
> > txdb
> TranscriptDb object:
> | Db type: TranscriptDb
> | Supporting package: GenomicFeatures
> | Data source: UCSC
> | Genome: hg19
> | Organism: Homo sapiens
> | UCSC Table: ensGene
> | Resource URL: http://genome.ucsc.edu/
> | Type of Gene ID: Ensembl gene ID
> | Full dataset: yes
> | miRBase build ID: NA
> | transcript_nrow: 204940
> | exon_nrow: 584914
> | cds_nrow: 280379
> | Db created by: GenomicFeatures package from Bioconductor
> | Creation time: 2014-06-17 09:34:13 -0700 (Tue, 17 Jun 2014)
> | GenomicFeatures version at creation time: 1.16.2
> | RSQLite version at creation time: 0.11.4
> | DBSCHEMAVERSION: 1.0
>
> So you might try again. If you are on Windows, you might be having a
> proxy issue, in which case you might use the setInternet2() function
> prior to running makeTranscriptDbFromUCSC().
>
> Best,
>
> Jim
>
>
>
>
>
>
>
> txdb <-
> makeTranscriptDbFromUCSC(__genome='hg19',tablename='__ensGene')
>
> Error in function (type, msg, asError = TRUE) : couldn't
> connect to host
>
> I did install the required packages, so not what I am missing here.
>
> source("http://bioconductor.__org/biocLite.R
> <http://bioconductor.org/biocLite.R>")
> biocLite()
> biocLite(c("GenomicFeatures", "AnnotationDbi"))
> library("GenomicFeatures")
>
> Could you please help me with this error.
>
> Many Thanks
> Sharvari Gujja
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list