[BioC] makeTranscriptDbFromBiomart error

Marc Carlson mcarlson at fhcrc.org
Thu Jun 7 21:32:12 CEST 2012


One more thing:

The uswest ensmbl biomart mirror has apparently been updated with the 
fix (for reasons that are not known to me, the default has still not 
been updated).  So if you look at the manual page for

  ?makeTranscriptDbFromBiomart

You can see an example of how to use the uswest.ensembl.org host by 
specifying the bomart and host arguments.


   Marc



On 06/07/2012 10:40 AM, Marc Carlson wrote:
> Hi Stefanie,
>
> This is related to a bug with the 5' and 3' starts/ends that was in 
> the latest version of biomaRt.  We reported it to them a couple weeks 
> ago because it immediately started to break some of our quality 
> control tests for GenomicFeatures.  At that time, they told us that it 
> has been fixed, but it will still take a couple of weeks for their 
> correction to propagate out.  In the meantime, using either 
> makeTranscriptDbFromUCSC() or the stock annotation packages for human, 
> might be a good work-around for you.
>
> The warning that you saw for makeTranscriptDbFromUCSC() was another 
> quality control check.  We expect that when an annotation resource 
> tells us the range for a CDS that this range should be divisible by 
> three.  When this doesn't happen, we issue the warning you were seeing 
> for makeTranscriptDbFromUCSC().
>
> Hope that this clarifies things,
>
>
>   Marc
>
>
>
> On 06/07/2012 08:50 AM, Stefanie Tauber wrote:
>> Hi,
>>
>> here is my sessionInfo:
>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.8.0 AnnotationDbi_1.18.0  Biobase_2.16.0
>> [4] GenomicRanges_1.8.1   IRanges_1.14.2        BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>>   [1] biomaRt_2.12.0     Biostrings_2.24.0  bitops_1.0-4.1     
>> BSgenome_1.24.0
>>   [5] DBI_0.2-5          RCurl_1.91-1       Rsamtools_1.8.0    
>> RSQLite_0.11.1
>>   [9] rtracklayer_1.16.0 stats4_2.15.0      tools_2.15.0       XML_3.9-4
>> [13] zlibbioc_1.2.0
>>
>> I updated GenomicFeatures to 1.8.1, but unfortunately did not help.
>>
>>
>> BUT:  makeTranscriptDbFromUCSC did work :)
>>
>>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
>> Download the ensGene table ... OK
>> Extract the 'transcripts' data frame ... OK
>> Extract the 'splicings' data frame ... OK
>> Download and preprocess the 'chrominfo' data frame ... OK
>> Prepare the 'metadata' data frame ... metadata: OK
>> Make the TranscriptDb object ... OK
>> There were 50 or more warnings (use warnings() to see the first 50)
>>
>>> txdb
>> TranscriptDb object:
>> | Db type: TranscriptDb
>> | Supporting package: GenomicFeatures
>> | Data source: UCSC
>> | Genome: hg19
>> | Genus and Species: Homo sapiens
>> | UCSC Table: ensGene
>> | Resource URL: http://genome.ucsc.edu/
>> | Type of Gene ID: Ensembl gene ID
>> | Full dataset: yes
>> | miRBase build ID: NA
>> | transcript_nrow: 181648
>> | exon_nrow: 541825
>> | cds_nrow: 278798
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2012-06-07 17:48:45 +0200 (Thu, 07 Jun 2012)
>> | GenomicFeatures version at creation time: 1.8.1
>> | RSQLite version at creation time: 0.11.1
>> | DBSCHEMAVERSION: 1.0
>>
>>> warnings()
>> Warning messages:
>> 1: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], 
>> exon_locs$start[[i]],  ... :
>>    UCSC data anomaly in transcript ENST00000513161: the cds 
>> cumulative length is not a multiple of 3
>> 2: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], 
>> exon_locs$start[[i]],  ... :
>>    UCSC data anomaly in transcript ENST00000417833: the cds 
>> cumulative length is not a multiple of 3
>> 3: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], 
>> exon_locs$start[[i]],  ... :
>>    UCSC data anomaly in transcript ENST00000450884: the cds 
>> cumulative length is not a multiple of 3
>>
>>
>> Best,
>> Stefanie
>>
>> Am 07.06.2012 um 16:25 schrieb Steve Lianoglou:
>>
>>> Hi Stefanie,
>>>
>>> On Thu, Jun 7, 2012 at 5:16 AM, Stefanie Tauber
>>> <stefanie.tauber at univie.ac.at>  wrote:
>>>> Hi
>>>>
>>>> I just tried it with R 2.15, I get the same error.
>>>>
>>>> If I follow your suggestion:
>>>>
>>>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
>>>>
>>>>
>>>> I get:
>>>>
>>>> Download the ensGene table ... OK
>>>> Extract the 'transcripts' data frame ... OK
>>>> Extract the 'splicings' data frame ... OK
>>>> Download and preprocess the 'chrominfo' data frame ... Error in
>>>> download.file(url, destfile, quiet = TRUE) :
>>>>    cannot open URL
>>>> 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz' 
>>>>
>>>> In addition: There were 50 or more warnings (use warnings() to see 
>>>> the first
>>>> 50)
>>> [snip]
>>>
>>> Strange ... I also get the same warnings you get (the "cds cumulative
>>> length is not a multiple of 3") for some transcripts, but I think this
>>> is something beyond our control. I don't get any error(s) when
>>> downloading and building the TxDB, so it completes fine for me.
>>>
>>> I'm actually running the *-devel versions of the bioc packages w/
>>> R-2.15.x so it's not very easy for me to check the current released
>>> GenomicFeatures package, but I'd be a bit surprised if the error is
>>> there.
>>>
>>> Could you paste the output of `sessionInfo()` after you call
>>> `library(GenomicFeatures)` when running your new R-2.15.x install?
>>>
>>> -steve
>>>
>>>
>>> -- 
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>   | Memorial Sloan-Kettering Cancer Center
>>>   | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> DI Stefanie Tauber
>>
>> Center for Integrative Bioinformatics Vienna (CIBIV)
>> (CIBIV is a joint institute of Vienna University, Medical University, 
>> and University of Veterinary Medicine, Vienna, Austria)
>> Max F. Perutz Laboratories (MFPL)
>> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
>> Dr. Bohr Gasse 9
>> A-1030 Wien, Austria
>> Phone: ++43 +1 / 42772-4030
>> Fax:     ++43 +1 / 42772-4098
>> email:   stefanie.tauber at univie.ac.at
>> www.cibiv.at
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list