[BioC] GenomicRanges::reduce feature request

Steve Lianoglou lianoglou.steve at gene.com
Wed Aug 7 20:36:54 CEST 2013


Hi,

On Wed, Aug 7, 2013 at 11:29 AM, Zhu, Lihua (Julie)
<Julie.Zhu at umassmed.edu> wrote:
> Hi,
>
> The reduce function is very useful for joining neighboring ranges. However, the score information is lost after applying reduce. Is it possible to retain the score information after applying reduce?
>
> Here is an example.
>
> library(GenomicRanges)
>
> rd <- RangedData(
>       RangesList(
>            chrA=IRanges(start=c(1, 4, 6), width=c(3, 2, 4)),
>            chrB=IRanges(start=c(1, 3, 6), width=c(3, 3, 4))),
>         score=c(2, 7, 3, 1, 1, 1))
> rd
> RangedData with 6 rows and 1 value column across 2 spaces
>      space    ranges |     score
>   <factor> <IRanges> | <numeric>
> 1     chrA    [1, 3] |         2
> 2     chrA    [4, 5] |         7
> 3     chrA    [6, 9] |         3
> 4     chrB    [1, 3] |         1
> 5     chrB    [3, 5] |         1
> 6     chrB    [6, 9] |         1
>
> reduce(rd, min.gap=1)
> RangedData with 2 rows and 0 value columns across 2 spaces
>      space    ranges |
>   <factor> <IRanges> |
> 1     chrA    [1, 9] |
> 2     chrB    [1, 9] |
>
> Please note that score column is missing after applying reduce. The following is with score information.
>   space    ranges | score
>   <factor> <IRanges> |  <numeric>
> 1     chrA    [1, 9] |  12
> 2     chrB    [1, 9] |  3

I believe similar topics like this have come up before, and the
problem is that I don't think there's any general rule of thumb that
can apply to merging all `mcols` from merged/reduced ranges.

I guess the rule you would like to apply here is to sum the score(s)
from all the combined ranges -- but why sum? One might want to average
... or take a weighted average based on length of the combined ranges,
or geometric mean, or ...

What would is the right thing to do here if the ranges being merged
had categorical `mcols` data?

-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list