[BioC] gff files: how to tell if right-open interval convention used?

Julien Gagneur julien.gagneur at embl.de
Fri Apr 1 12:37:27 CEST 2011


Hi Simon, Karl and Herve,

As Simon points out, the GFF3 format is quite ill-defined. When implementing readGff3() in genomeIntervals, we had to interpret the specifications.

GFF3 allows zero-length features but there is no column to flag intervals as zero-length features. Have we missed something?

From the two sentences:
"Start is always less than or equal to end."
and
"For zero-length features, such as insertion sites, start equals end ..."

we understood that the only way to distinguish zero-length features from other features was to adopt a right-open convention. We thus interpreted this as an equivalence: "features are zero-length if and only if starts equals end". It is not exactly what it is said, but what else could we do?

Of course, we have noticed that most files actually use the right-closed convention and rarely have zero-length features. We therefore added the parameter isRightOpen.

Although not frequently provided, I believe zero-length features can be useful. genomeIntervals provides consistent support for these (including interval_overlap, etc).

Hope this clarifies the question.

Julien



More information about the Bioconductor mailing list