[BioC] makeTranscriptDbFromGFF Error for UCSC GTF File

Simon Anders anders at embl.de
Wed Jul 2 20:34:04 CEST 2014


Hi

On 02/07/14 20:17, Michael Lawrence wrote:
>> In contrast, using GTF or GFF files for making TranscriptDb objects is
>> always a little risky because many of these files will not have been
>> created with the intention of holding a transcriptome as data (which is the
>> specific thing that a TranscriptDb object is meant to hold).  This is
>> because the GTF and GFF file formats were not initially intended for the
>> specific purpose of holding a transcriptome but were instead intended to be
>> something more general.
>>
>>
> Actually GTF (Gene Transfer Format) files are designed specifically for
> representing gene models, and we have no excuse for not parsing them
> correctly. There have been some tweaks to attribute parsing (I thought
> limited to GFF3) in devel, so there may be a difference between Herve's
> devel result and Dario's release result.  I'll try to find some time to
> look into this.

The problem with GTF files produced by the UCSC Table Browser is that 
they contain incorrect gene IDs: The gene_id attribute is always set to 
the same value as the transcript_id, and these files hence cannot be 
used to define gene models without manual correction (which would be to 
remove the transcript number suffix from the gene IDs).

Long ago, I have asked the UCSC Genome Browser help-desk why this is and 
was told that it is a bug in the Table Browser which they cannot fix, 
for some reason.

Hence, I usually advise to not use these files.

   Simon



More information about the Bioconductor mailing list