[BioC] How to reduce two GRanges simutaneously

Hervé Pagès hpages at fhcrc.org
Tue Apr 8 21:42:58 CEST 2014


Hi Liang, Michael,

Yeah, an interesting problem indeed.

Here is my take on it:

   # Reduce 'x' and 'y' simultaneously. Returns a list of 2 parallel
   # GRanges, the 1st one is reduced 'x' and the 2nd one reduced 'y'.
   reduceTwoGRanges <- function(x, y)
   {
     if (length(x) != length(y))
       stop("'x' and 'y' must have the same length")
     rx_revmap <- mcols(reduce(x, with.revmap=TRUE))$revmap
     ry_revmap <- mcols(reduce(y, with.revmap=TRUE))$revmap
     rx_map <- integer(length(x))
     rx_map[unlist(rx_revmap, use.names=FALSE)] <- togroup(rx_revmap)
     ry_map <- integer(length(y))
     ry_map[unlist(ry_revmap, use.names=FALSE)] <- togroup(ry_revmap)
     sm <- IRanges:::selfmatchIntegerPairs(rx_map, ry_map)
     ans1 <- unlist(range(split(x, sm)), use.names=FALSE)
     ans2 <- unlist(range(split(y, sm)), use.names=FALSE)
     list(ans1, ans2)
   }

Cheers,
H.

On 04/08/2014 05:37 AM, Michael Lawrence wrote:
> This has got to be one of the most complex problems posed here.
>
> Here is one approach, but it may be completely wrong:
>
> hits_x <- findOverlaps(x)
> hits_y <- findOverlaps(y)
> hits_both <- intersect(hits_x, hits_y)
>
> I think the constraint of mutual overlap in x and y is now satisfied. In
> order to reduce, we have to think of the Hits object as a graph, where
> the connected components are those ranges that need to be reduced.
>
> g <- graph::graphNEL(as.character(seq_len(length(x))),
> split(subjectHits(hits_both), queryHits(hits_both)))
> comp <- unname(RBGL::connectedComp(g)) # unname just for efficiency
>
> That graphNEL constructor might kill your performance? There are other
> ways to build the graph. Now we can form GRangesLists from the connected
> components, and since we know they all overlap, just get the range, and
> unlist them back to GRanges:
>
> ids <- as.integer(unlist(comp))
> reduced_x <- unlist(range(relist(x[ids], comp)))
> reduced_y <- unlist(range(relist(y[ids], comp)))
>
> And combine the result into a DataFrame with the range counts:
>
> answer <- DataFrame(x=reduced_x, y=reduced_y, count=elementLengths(comp))
>
> Hopefully that gets you closer. This hasn't been tested at all.
>
> Michael
>
>
>
> On Mon, Apr 7, 2014 at 11:40 AM, Niu, Liang (NIH/NIEHS) [E]
> <liang.niu at nih.gov <mailto:liang.niu at nih.gov>> wrote:
>
>     Dear Herve,
>
>     I have a question: suppose that I have two GRanges of the same
>     length, say x and y, where x[i] and y[i] are two genomic ranges that
>     are paired. What I want to do is to reduce x and y simultaneously,
>     i.e., combine (x[i],y[i]) and (x[j],y[j]) when x[i] overlaps with
>     x[j] and y[i] overlaps with y[j]. Ideally, I would like an output in
>     the following format:
>
>     range_x            range_y            count
>     chr?:???-???     chr?:???-???      ?
>
>     where the range_x and range_y are the reduced ranges, and count is
>     the number of records associated with the chr?:???-???  and
>       chr?:???-???. How can I do this?
>
>     Thanks!
>
>     Liang
>
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list