[BioC] makeTranscriptDbFromGFF for Unstranded Transcripts

Dario Strbenac dstr7320 at uni.sydney.edu.au
Thu May 9 03:00:16 CEST 2013


> The same could probably be said of GTF and GFF files, and I wonder if
> storing a set of unstranded mRNAs, exons, CDSs in those files is
> considered valid.

>From the specification, it is.

strand - Valid entries include '+', '-', or '.' (for don't know/don't care).

> Anyway, if we wanted makeTranscriptDbFromGFF() to support such GTF and
> GFF files, we would need to automatically replace all missing strands
> by a + or a -.

It is better if it retains the error result, so there is no ambiguity. Adding a sentence about this to the help file would be useful, since users will expect that it reads in all valid GTF and GFF files.

>
> makeTranscriptDbFromGFF("transcripts.gtf", format = "gtf", exonRankAttributeName = "exon_number")

> Ok, so you've managed to store the exon rank in your file. But that
> means that you must have *implicitly* chosen a strand for your exons
> right?

Cufflinks can infer the strand of the transcript for multi-exon transcripts by looking for the canonical GT-AG splice site in reads mapping across an intron, but not for single exon genes. So, it outputs a strand for some genes and not for others.


More information about the Bioconductor mailing list