[BioC] GenomicFeatures Reading GFF Efficiency

Marc Carlson mcarlson at fhcrc.org
Tue Nov 20 01:34:51 CET 2012


Hi Dario,

I have found and killed a couple bugs with this parser and the fix 
should show up in the next couple days.

I will work on better performance as well, but that is not in the latest 
update as I had to fix the bug 1st.  But please be aware that a lot of 
the reason for the slow performance is because GTF files are not 
required to encode exon ranking information.  In the 800+ megabyte file 
you were parsing, there only way to get exon rank information was by 
deducing it based on the provided coordinate positions.  The fact that 
this file does not provide that information should probably concern 
you.  Even though the inference can be done by the parser, it takes time 
to do and more importantly: it makes assumptions about your data.  So it 
really should not be done if you can avoid it.  This is why the function 
is throwing a warning about the fact that it is infering the exon rankings.

So if you can get the data in another format, or at least from a GTF 
file that does provide the exon ranking information, that would be 
strongly recommended.


   Marc



On 11/15/2012 06:00 PM, Dario Strbenac wrote:
> After nearly 2 days, it gave an error :
>
> Processing splicing information for gtf file.
> Error in `colnames<-`(`*tmp*`, value = c("exon_chrom", "exon_start", "exon_end",  :
>    'names' attribute [9] must be the same length as the vector [6]
> In addition: Warning message:
> In .deduceExonRankings(exs) :
>    Infering Exon Rankings.  If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName
>
> This is the 1.10.0 version of GenomicFeatures in R 2.15.1.
>
> Meanwhile, GENCODE version 14 is released, so you wouldn't have wanted my object of version 13 annotations, in the end.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list