[BioC] makeTranscriptDbFromBiomart error

Marc Carlson mcarlson at fhcrc.org
Thu Jun 7 19:40:32 CEST 2012


Hi Stefanie,

This is related to a bug with the 5' and 3' starts/ends that was in the 
latest version of biomaRt.  We reported it to them a couple weeks ago 
because it immediately started to break some of our quality control 
tests for GenomicFeatures.  At that time, they told us that it has been 
fixed, but it will still take a couple of weeks for their correction to 
propagate out.  In the meantime, using either makeTranscriptDbFromUCSC() 
or the stock annotation packages for human, might be a good work-around 
for you.

The warning that you saw for makeTranscriptDbFromUCSC() was another 
quality control check.  We expect that when an annotation resource tells 
us the range for a CDS that this range should be divisible by three.  
When this doesn't happen, we issue the warning you were seeing for 
makeTranscriptDbFromUCSC().

Hope that this clarifies things,


   Marc



On 06/07/2012 08:50 AM, Stefanie Tauber wrote:
> Hi,
>
> here is my sessionInfo:
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicFeatures_1.8.0 AnnotationDbi_1.18.0  Biobase_2.16.0
> [4] GenomicRanges_1.8.1   IRanges_1.14.2        BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.12.0     Biostrings_2.24.0  bitops_1.0-4.1     BSgenome_1.24.0
>   [5] DBI_0.2-5          RCurl_1.91-1       Rsamtools_1.8.0    RSQLite_0.11.1
>   [9] rtracklayer_1.16.0 stats4_2.15.0      tools_2.15.0       XML_3.9-4
> [13] zlibbioc_1.2.0
>
> I updated GenomicFeatures to 1.8.1, but unfortunately did not help.
>
>
> BUT:  makeTranscriptDbFromUCSC did work :)
>
>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
> Download the ensGene table ... OK
> Extract the 'transcripts' data frame ... OK
> Extract the 'splicings' data frame ... OK
> Download and preprocess the 'chrominfo' data frame ... OK
> Prepare the 'metadata' data frame ... metadata: OK
> Make the TranscriptDb object ... OK
> There were 50 or more warnings (use warnings() to see the first 50)
>
>> txdb
> TranscriptDb object:
> | Db type: TranscriptDb
> | Supporting package: GenomicFeatures
> | Data source: UCSC
> | Genome: hg19
> | Genus and Species: Homo sapiens
> | UCSC Table: ensGene
> | Resource URL: http://genome.ucsc.edu/
> | Type of Gene ID: Ensembl gene ID
> | Full dataset: yes
> | miRBase build ID: NA
> | transcript_nrow: 181648
> | exon_nrow: 541825
> | cds_nrow: 278798
> | Db created by: GenomicFeatures package from Bioconductor
> | Creation time: 2012-06-07 17:48:45 +0200 (Thu, 07 Jun 2012)
> | GenomicFeatures version at creation time: 1.8.1
> | RSQLite version at creation time: 0.11.1
> | DBSCHEMAVERSION: 1.0
>
>> warnings()
> Warning messages:
> 1: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], exon_locs$start[[i]],  ... :
>    UCSC data anomaly in transcript ENST00000513161: the cds cumulative length is not a multiple of 3
> 2: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], exon_locs$start[[i]],  ... :
>    UCSC data anomaly in transcript ENST00000417833: the cds cumulative length is not a multiple of 3
> 3: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], exon_locs$start[[i]],  ... :
>    UCSC data anomaly in transcript ENST00000450884: the cds cumulative length is not a multiple of 3
>
>
> Best,
> Stefanie
>
> Am 07.06.2012 um 16:25 schrieb Steve Lianoglou:
>
>> Hi Stefanie,
>>
>> On Thu, Jun 7, 2012 at 5:16 AM, Stefanie Tauber
>> <stefanie.tauber at univie.ac.at>  wrote:
>>> Hi
>>>
>>> I just tried it with R 2.15, I get the same error.
>>>
>>> If I follow your suggestion:
>>>
>>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
>>>
>>>
>>> I get:
>>>
>>> Download the ensGene table ... OK
>>> Extract the 'transcripts' data frame ... OK
>>> Extract the 'splicings' data frame ... OK
>>> Download and preprocess the 'chrominfo' data frame ... Error in
>>> download.file(url, destfile, quiet = TRUE) :
>>>    cannot open URL
>>> 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz'
>>> In addition: There were 50 or more warnings (use warnings() to see the first
>>> 50)
>> [snip]
>>
>> Strange ... I also get the same warnings you get (the "cds cumulative
>> length is not a multiple of 3") for some transcripts, but I think this
>> is something beyond our control. I don't get any error(s) when
>> downloading and building the TxDB, so it completes fine for me.
>>
>> I'm actually running the *-devel versions of the bioc packages w/
>> R-2.15.x so it's not very easy for me to check the current released
>> GenomicFeatures package, but I'd be a bit surprised if the error is
>> there.
>>
>> Could you paste the output of `sessionInfo()` after you call
>> `library(GenomicFeatures)` when running your new R-2.15.x install?
>>
>> -steve
>>
>>
>> -- 
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>   | Memorial Sloan-Kettering Cancer Center
>>   | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
> DI Stefanie Tauber
>
> Center for Integrative Bioinformatics Vienna (CIBIV)
> (CIBIV is a joint institute of Vienna University, Medical University, and University of Veterinary Medicine, Vienna, Austria)
> Max F. Perutz Laboratories (MFPL)
> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
> Dr. Bohr Gasse 9
> A-1030 Wien, Austria
> Phone: ++43 +1 / 42772-4030
> Fax:     ++43 +1 / 42772-4098
> email:   stefanie.tauber at univie.ac.at
> www.cibiv.at
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list