[BioC] rtracklayer::liftOver ordering

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Thu Aug 25 03:42:41 CEST 2011


How efficient would this be?  I sometimes use liftOver on millions of regions.

Kasper

2011/8/24 Michael Lawrence <lawrence.michael at gene.com>:
> That's a good idea. I can make that change.
>
> Michael
>
> 2011/8/24 Hervé Pagès <hpages at fhcrc.org>
>
>> Hi there,
>>
>>
>> On 11-08-24 10:48 AM, Michael Lawrence wrote:
>>
>>> On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe<ajaffe at jhsph.edu>  wrote:
>>>
>>>  I'm having a problem maintaining the ordering of my GRanges object
>>>> when I lift it over using rtracklayer::liftOver. For example:
>>>>
>>>>  g # my regions
>>>>>
>>>> GRanges with 5 ranges and 0 elementMetadata values
>>>>    seqnames                 ranges strand |
>>>>       <Rle>               <IRanges>   <Rle>  |
>>>> [1]    chr19 [ 13130686,  13133039]      * |
>>>> [2]     chr4 [160026138, 160028079]      * |
>>>> [3]    chr12 [ 65671230,  65672140]      * |
>>>> [4]     chr8 [ 19615409,  19616461]      * |
>>>> [5]    chr14 [ 99706752,  99708661]      * |
>>>>
>>>>  chain = import.chain("hg19ToHg18.over.**chain") # from UCSC
>>>>> lifted = liftOver(g, chain) # suppressed unmatched chrs
>>>>> lifted
>>>>>
>>>> GRanges with 5 ranges and 0 elementMetadata values
>>>>    seqnames                 ranges strand |
>>>>       <Rle>               <IRanges>   <Rle>  |
>>>> [1]     chr4 [160245588, 160247529]      * |
>>>> [2]     chr8 [ 19659689,  19660741]      * |
>>>> [3]    chr12 [ 63957497,  63958407]      * |
>>>> [4]    chr14 [ 98776505,  98778414]      * |
>>>> [5]    chr19 [ 12991686,  12994039]      * |
>>>>
>>>> This is just a toy example with 5 regions all on different
>>>> chromosomes, but with real data where there are multiple regions per
>>>> chromosome, I am unable to determine the resulting matched lifted data
>>>> for a particular region. Is there any way to preserve the ordering of
>>>> my original list in the liftOver output? Presorting by chromosome and
>>>> position might work 99% of time, but the ordering of some regions
>>>> might shift during the liftOver, and I would not be able to tell if
>>>> this occurred.
>>>>
>>>>
>>>>  I think Kasper's suggestion of an ID column is a good one. The basic
>>> problem
>>> is that there is not necessarily a 1-1 correspondence after lift-over. A
>>> single region in say human could be broken up into multiple regions in
>>> mouse.
>>>
>>
>> An alternative would be that liftOver() returns a GRangesList instead
>> of GRanges. People who don't care about the exact mapping between
>> the input and the output could always do 'unlist(liftOver(g, chain))'
>> and get what they are getting right now.
>>
>> H.
>>
>>
>>> Michael
>>>
>>> Thanks a lot,
>>>
>>>> Andrew Jaffe
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>>
>>>        [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list