[BioC] rtracklayer::liftOver ordering

Hervé Pagès hpages at fhcrc.org
Thu Aug 25 00:35:34 CEST 2011


Hi there,

On 11-08-24 10:48 AM, Michael Lawrence wrote:
> On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe<ajaffe at jhsph.edu>  wrote:
>
>> I'm having a problem maintaining the ordering of my GRanges object
>> when I lift it over using rtracklayer::liftOver. For example:
>>
>>> g # my regions
>> GRanges with 5 ranges and 0 elementMetadata values
>>     seqnames                 ranges strand |
>>        <Rle>               <IRanges>   <Rle>  |
>> [1]    chr19 [ 13130686,  13133039]      * |
>> [2]     chr4 [160026138, 160028079]      * |
>> [3]    chr12 [ 65671230,  65672140]      * |
>> [4]     chr8 [ 19615409,  19616461]      * |
>> [5]    chr14 [ 99706752,  99708661]      * |
>>
>>> chain = import.chain("hg19ToHg18.over.chain") # from UCSC
>>> lifted = liftOver(g, chain) # suppressed unmatched chrs
>>> lifted
>> GRanges with 5 ranges and 0 elementMetadata values
>>     seqnames                 ranges strand |
>>        <Rle>               <IRanges>   <Rle>  |
>> [1]     chr4 [160245588, 160247529]      * |
>> [2]     chr8 [ 19659689,  19660741]      * |
>> [3]    chr12 [ 63957497,  63958407]      * |
>> [4]    chr14 [ 98776505,  98778414]      * |
>> [5]    chr19 [ 12991686,  12994039]      * |
>>
>> This is just a toy example with 5 regions all on different
>> chromosomes, but with real data where there are multiple regions per
>> chromosome, I am unable to determine the resulting matched lifted data
>> for a particular region. Is there any way to preserve the ordering of
>> my original list in the liftOver output? Presorting by chromosome and
>> position might work 99% of time, but the ordering of some regions
>> might shift during the liftOver, and I would not be able to tell if
>> this occurred.
>>
>>
> I think Kasper's suggestion of an ID column is a good one. The basic problem
> is that there is not necessarily a 1-1 correspondence after lift-over. A
> single region in say human could be broken up into multiple regions in
> mouse.

An alternative would be that liftOver() returns a GRangesList instead
of GRanges. People who don't care about the exact mapping between
the input and the output could always do 'unlist(liftOver(g, chain))'
and get what they are getting right now.

H.

>
> Michael
>
> Thanks a lot,
>> Andrew Jaffe
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list