[BioC] GenomicFeatures::makeTranscriptDbFromBiomart - BioMart data anomaly: for some transcripts, the cds cumulative length inferred from the exon and UTR info doesn't match the "cds_length" attribute from BioMart

Hervé Pagès hpages at fhcrc.org
Mon Feb 6 23:18:45 CET 2012


Hi Rhoda and others,

I still need to check that this error issued by internal helper
.extractCdsRangesFromBiomartTable() about "the cds cumulative
length inferred from the exon and UTR not matching the cds_length
attribute from BioMart" is not a FALSE positive.

I'm planning to patch the code in charge of this sanity check
so it issues a warning instead of an error and it displays
something more useful than just "for some transcripts etc...".
Would be nice to know at least for which transcript.

I'll keep you informed, thanks!
H.


On 02/06/2012 12:53 AM, Rhoda Kinsella wrote:
> Hi Malcolm and Marc,
> Please submit an Ensembl helpdesk ticket about this issue along with a
> detailed example to (helpdesk at ensembl.org) and we will look into it.
> Kind regards
> Rhoda
>
>
> On 3 Feb 2012, at 20:32, Cook, Malcolm wrote:
>
>> Hi Marc, and other `library(GenomicFeatures)` users working in fly,
>>
>> I just changed Subject to keep alive one of the issues I still have,
>> namely:
>>
>> I get the following error:
>>
>>> library(GenomicFeatures)
>>> txdb<-makeTranscriptDbFromBiomart(biomart="ensembl",
>>> dataset="dmelanogaster_gene_ensembl", circ_seqs=NULL))
>> Download and preprocess the 'transcripts' data frame ... OK	
>> Download and preprocess the 'chrominfo' data frame ... OK
>> Download and preprocess the 'splicings' data frame ... Error
>> in .extractCdsRangesFromBiomartTable(bm_table) :	
>>   BioMart data anomaly: for some transcripts, the cds cumulative
>> length inferred from the exon and UTR info doesn't match the
>> "cds_length" attribute from BioMart
>>
>>
>> Marc, you already observed that:
>>
>>>>> the data for cds ranges and total cds length (both from biomaRt) no
>>>>> longer agree with each other.  In other words, the data from the
>>>>> current
>>>>> drosophila ranges in biomaRt seems to disagree with itself, and
>>>>> so the
>>>>> code is refusing to make a package out of this data as a result.
>>>>> To get the 2nd issue fixed probably involves talking to ensembl
>>>>> about
>>>>> their CDS data for fly to see if we can resolve the discrepancy.
>>>> I would be happy to take this to them.
>>
>> I still wonder:
>>
>>> Can you recommend a best way to get a more diagnostic trace from the
>>> attempt at txdb creation so we can correctly report to ensembl team
>>> the
>>> errant transcript(s) ?
>>
>> I would be happy to take this up with Ensembl team, but, need
>> details which I don't know how to produce.
>>
>>
>> Finally, one the side, here is a tiny suggestion:
>>
>> 	* change the default for circ_seqs in makeTranscriptDbFromBiomart
>> to be NULL, instead of any organism (human) specific.
>>
>> Regards,
>>
>> --Malcolm
>>
>>
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.6.7 AnnotationDbi_1.16.11 Biobase_2.14.0
>> [4] GenomicRanges_1.6.6   IRanges_1.12.5
>>
>> loaded via a namespace (and not attached):
>> [1] BSgenome_1.22.0    Biostrings_2.22.0  DBI_0.2-5
>> RCurl_1.9-5
>> [5] RSQLite_0.11.1     XML_3.9-4          biomaRt_2.10.0
>> rtracklayer_1.14.4
>> [9] tools_2.14.0       zlibbioc_1.0.0
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Rhoda Kinsella Ph.D.
> Ensembl Production Project Leader,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list