[BioC] rtracklayer and UCSC
keith at wehi.EDU.AU
Fri May 15 02:23:42 CEST 2009
My understanding of UCSC co-ordinates is, as Sean says, zero based and one
based. However I have stopped using the word "start" and "end" with UCSC
co-ordinates. I believe it would be better to use "left" and "right".
The UCSC data definitions of their annotation files, see:
use txStart/txEnd, cdsStart/cdsEnd, exonStarts/exonEnds. However these
co-ordinates are only start and end co-ordinates for positive strand genes. They
are end and start co-ordinates for negative strand genes, assuming that start
means the 5 prime end of a gene.
I think it is more accurate to say that LEFT end UCSC co-ordinates are zero
based and RIGHT end UCSC co-ordinates are one based.
However note that whenever UCSC are displaying co-ordinates to GUI users, they
adjust left end co-ordinates back to being one based. If I remember correctly,
if you use the DNA option in the UCSC browser to get DNA bases, the co-ordinates
are all still one based, but as stated, if you download the annotation files,
such as refGene.txt, from the above link, the left co-ordinates are zero based.
I don't know how rtracklayer handles this issue.
Sean Davis wrote:
> On Thu, May 14, 2009 at 7:29 PM, Kasper Daniel Hansen <
> khansen at stat.berkeley.edu> wrote:
>> As far as I know USCS uses zero-based indexing of their genomes, R uses
>> 1-based. What kind of conversion is being used by rtracklayer - I suspect
>> none at all? It might be worthwhile to add a discussion about this somewhere
>> in the vignette?
> It is even slightly more complicated than that. They use zero-based starts
> and 1-based ends, except for graphical display:
>> More specifically, I have downloaded a couple of tables from UCSC using
>> rtracklayer and I wanted to know if I need to add 1 to the column named
>> exonStart (after a suitable splitting - it is a comma separated character
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor