[BioC] Some particulars about rtracklayer::liftOver

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Apr 24 08:43:39 CEST 2012


Hi all,

I'm really not too well versed on the ins and outs of using `liftOver`
from via the kent tools, so I'm comparing results from using
rtracklayer::liftOver vs. a default run of liftOver from the command
line.

As a disclaimer, I'm lifting over coordinates between species. I know
this isn't the best idea -- the party line from the UCSC folks seem to
suggest that this is a bad idea because it is a "coarse" mapping, and
you might really want to use pslMap in this case (which I will play
with, too). For those who are interested, here's a few links for you:

https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015074.html
https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html
(summarizes next link)
https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015032.html
http://article.gmane.org/gmane.science.biology.ucscgenome.general/11300

The correct command line switches to invoke pslMap using a liftover chain:
https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html

>From the first link, pslMap is apparently preferred because:

"""
pslMap shows more detail -- it translates a region using each block
of the chain, introducing gaps where the chain has gaps.
"""

>From some results of rtracklayer::liftOver, it seems that this call
implements this type of functionality from pslMap -- is that true?

I say that because some of the ranges I liftOver are split into > 1
range into the target genome. Even though they are two ranges,
sometimes they are contiguous, or only separated by a small gap.

>From the few cases I've checked -- some examples below -- these
multiple ranges returned from rtracklayer::liftOver correspond to the
results from doing a pslMap

For instance:

The following mm9 range:

    chr1 [4836085, 4836142]      +

Gets lifted to hg19 here (notice small gap of width 5) with rtracklayer:

    chr8 [54959674, 54959715]      -
    chr8 [54959654, 54959669]      -

but from the command line, it's lifted to:

    chr8	54959654	54959716 ... -

At first I thought the rtracklayer fenceposts and the
ucsc/cmd-line::liftOver results fence-posted the same interval, but
the command line result is supposed to be ucsc-half-open (so start
from 0), right? The GRanges were exported to a bed via rtracklayer, so
I can confirm that coordinates were xlated to half-open correctly. But
the result of the liftOver from the command line is off-by-one at its
start position when compared to rtracklayer, no?

Further -- these two target ranges are the same that come out of a
pslMap, so -- that's nice.

I started writing this post to inquire rtracklayer's insertion of gaps
and to confirm that my by eye matching of this w/ pslMap output is
intentional (I guess that's weird of me to ask, but just wanted to be
sure). Then this off-by-one thing popped up as I was writing. There
are other regions that are lifted over whose coords from the command
line seem to have "the correct" -1 shift at the start ("correct" in as
far as I'm mentally doing the half-open coord shift), and equal at the
end, eg. the mm9 range:

    chr1 [9617177, 9617221]      +

usring rtracklayer::liftOver goes to hg19:

    chr8 [67430719, 67430763]      +

and from command line:

    chr8	67430718	67430763 ... +

which seems to me like the same interval as rtracklayer.

So, in a nut:

(1) rtracklayer's gapped-liftOver magic seems to return the same info
I would get from a call to pslMap. So, (a) thanks for that, and (b)
I'm correct, right? :-)

(2) This off by one thing -- am I just seeing things, or?

Thanks for making it this far,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list