[BioC] Mapping genomic coordinates to transcript coordinates? (revived)

Chris Fields cjfields at illinois.edu
Thu Mar 3 15:45:30 CET 2011


On Mar 3, 2011, at 1:58 AM, Pages, Herve wrote:

> Hi Chris, Malcolm,
> 
> There is the transcriptLocs2refLocs() function in Biostrings that
> does the reverse mapping i.e. it maps transcript coordinates to
> genomic coordinates. There is no doubt that the GenomicFeatures
> package would be a better place for this function so we should move
> it there.

... <apologies, excised the very useful code for easier reading> ...

> It's vectorized and fast (implemented in C).

Nice!

> Unfortunately we don't have a refLocs2transcriptLocs() function at
> the moment for going the other way around but, yes, that's something
> we should definitely have. When called on the previous result and with
> the same 'exonStarts', 'exonEnds' and 'strand' values, it should return
> the original 'tlocs'.
> 
> There would be 2 complications for such a refLocs2transcriptLocs though:
> 
>  1. If the genomic location doesn't hit the transcript. Not a big deal,
>     NA could be used for this.

Agreed.

>  2. Sometimes (very rarely) the genomic location hits an ambiguous
>     location on the transcript (e.g. for a small number of transcripts
>     in UCSC knownGene track, some exons overlap). What to do then?

I suppose we would need examples of this, at least for documenting in the future.  As for what to do, not sure myself beyond issuing a warning about the ambiguity and returning the first or last value (or have an argument indicating what to do under such circumstances, such as allow a user-defined function pick the value, etc).

> Also those 2 functions should really be in GenomicFeatures, not
> in Biostrings, and their interface should be modernized to accept
> a GRangesList object instead of exonStarts, exonEnds and strand
> (the transcriptLocs2refLocs() function predates the GenomicRanges
> era).

I agree.  I wouldn't think to find this in Biostrings.

> Here in Seattle we didn't work on this yet because of lack of time
> and also because there was apparently no demand for it so far. For
> now, I'm just going to move transcriptLocs2refLocs() to GenomicFeatures
> so it's more visible and it will also make it easier for someone
> interested to contribute.
> 
> H.

Seems to be the way things are implemented in any OS project, someone has an itch to scratch.

chris



More information about the Bioconductor mailing list