[BioC] GenomicFeatures makeTranscriptDbFromBiomart failure

Cory Barr barr.cory at gene.com
Thu Jan 5 21:06:34 CET 2012


Tim, the lastest version of makeTranscriptDbFriomBiomart should let
you specify the host argument.

-Cory

2012/1/4 Hervé Pagès <hpages at fhcrc.org>:
> Hi Tim,
>
>
> On 11/09/2011 10:27 AM, Hervé Pagès wrote:
>>
>> Hi,
>>
>> On 11-11-09 03:33 AM, Tim Rayner wrote:
>>>
>>> Hi Marc,
>>>
>>> Thanks very much for looking into this, and also to Michael for
>>> providing the patch. I've upgraded my GRanges package and the code now
>>> runs with a couple of warnings:
>>>
>>>> txdb.Hs2<- makeTranscriptDbFromBiomart(biomart='ensembl',
>>>> dataset='hsapiens_gene_ensembl')
>>>
>>> Download and preprocess the 'transcripts' data frame ... OK
>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=>
>>> skipped)
>>> Download and preprocess the 'splicings' data frame ... OK
>>> Download and preprocess the 'genes' data frame ... OK
>>> Prepare the 'metadata' data frame ... OK
>>> Make the TranscriptDb object ... OK
>>> Warning messages:
>>> 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
>>> else paste(labels, :
>>> duplicated levels will not be allowed in factors anymore
>>> 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
>>> else paste(labels, :
>>> duplicated levels will not be allowed in factors anymore
>>> 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom,
>>> splicings$exon_chrom) :
>>> chromosome lengths and circularity flags are not available for this
>>> TranscriptDb object
>>
>>
>> The 2 first warnings + the fact that downloading the chrominfo failed
>> is not looking good. Didn't use to be like that. We'll investigate on
>> our side and report later.
>
>
> The problem that was preventing makeTranscriptDbFromBiomart() to
> fetch the 'chrominfo' data frame (i.e. chromosome lengths) from
> Ensembl has been fixed. Make sure you update to the latest version
> of GenomicFeatures (v 1.6.5 in BioC release, v 1.7.8 in BioC
> devel). Available via biocLite().
>
> The warnings about duplicated levels still need to be investigated.
>
> Cheers,
>
> H.
>
>>
>> Cheers,
>> H.
>>
>>>
>>> So I think the problem is basically fixed. I wonder if perhaps the
>>> issue was caused by truncated data transfers; I observed several
>>> similar failures earlier yesterday afternoon, but in each case the
>>> problem seemed to occur at a different point in the process.
>>>
>>> Thanks again,
>>>
>>> Tim
>>>
>>> On 8 November 2011 20:16, Marc Carlson<mcarlson at fhcrc.org> wrote:
>>>>
>>>> Hi Tim,
>>>>
>>>> There was a small bug last week for this method caused by a decision at
>>>> ensembl to start supporting psuedoautosomal regions, but it was fixed
>>>> last
>>>> week and should be fixed with the version of GenomicFeatures reported
>>>> here.
>>>> I just ran your code locally 4 minutes ago and it still works here. The
>>>> only difference I can see is that my GRanges package is one version
>>>> higher
>>>> than yours (GenomicRanges_1.6.2). Please update that package and then
>>>> run
>>>> it again and see if you have better luck with ensembl.
>>>>
>>>> The patch that Michael mentioned actually arrived at the exact moment
>>>> that I
>>>> was testing the bug fix above which means that it has a some conflicts I
>>>> will have to resolve, but it should be added to devel very soon.
>>>>
>>>>
>>>> Marc
>>>>
>>>>
>>>>
>>>> On 11/08/2011 03:55 AM, Michael Lawrence wrote:
>>>>>
>>>>>
>>>>> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but
>>>>>> I've run into a problem. As shown below, the equivalent operation for
>>>>>> the mouse Biomart works fine:
>>>>>>
>>>>>>> # Mouse TranscriptDb created without a hitch:
>>>>>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl',
>>>>>>
>>>>>>
>>>>>> dataset='mmusculus_gene_ensembl')
>>>>>> Download and preprocess the 'transcripts' data frame ... OK
>>>>>> Download and preprocess the 'chrominfo' data frame ... OK
>>>>>> Download and preprocess the 'splicings' data frame ... OK
>>>>>> Download and preprocess the 'genes' data frame ... OK
>>>>>> Prepare the 'metadata' data frame ... OK
>>>>>> Make the TranscriptDb object ... OK
>>>>>>
>>>>>>> # Here's the problem:
>>>>>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl',
>>>>>>
>>>>>>
>>>>>> dataset='hsapiens_gene_ensembl')
>>>>>> Download and preprocess the 'transcripts' data frame ... OK
>>>>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=>
>>>>>> skipped)
>>>>>> Download and preprocess the 'splicings' data frame ... Error in
>>>>>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>>>>>> line 800380 did not have 11 elements
>>>>>>
>>>>>>> sessionInfo()
>>>>>>
>>>>>>
>>>>>> R version 2.14.0 (2011-10-31)
>>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0
>>>>>> [4] GenomicRanges_1.6.1 IRanges_1.12.1
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5
>>>>>> RCurl_1.6-10
>>>>>> [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0
>>>>>> rtracklayer_1.14.1
>>>>>> [9] tools_2.14.0 zlibbioc_1.0.0
>>>>>>
>>>>>> I don't know if this is an issue with the Biomart instance or the
>>>>>> GenomicFeatures package. I was wondering if anyone had any suggestions
>>>>>> as to how I might work around this?
>>>>>>
>>>>>> On a related note, would it be possible to add the ability to point
>>>>>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one
>>>>>> would, for example, by calling
>>>>>> biomaRt::useMart(host='www.ensembl.org', ...))?
>>>>>
>>>>>
>>>>> We've submitted a patch that does just this, as well as supporting an
>>>>> attribute prefix string for selecting alternative gene models.
>>>>>
>>>>>
>>>>>> It would probably be
>>>>>> good to be able to pass through the 'archive' argument to useMart as
>>>>>> well.
>>>>>>
>>>>>> Many thanks,
>>>>>>
>>>>>> Tim Rayner
>>>>>>
>>>>>> --
>>>>>> Bioinformatician
>>>>>> Smith Lab, CIMR
>>>>>> University of Cambridge
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list