[BioC] How to reduce two GRanges simutaneously
Hervé Pagès
hpages at fhcrc.org
Tue Apr 8 21:42:58 CEST 2014
Hi Liang, Michael,
Yeah, an interesting problem indeed.
Here is my take on it:
# Reduce 'x' and 'y' simultaneously. Returns a list of 2 parallel
# GRanges, the 1st one is reduced 'x' and the 2nd one reduced 'y'.
reduceTwoGRanges <- function(x, y)
{
if (length(x) != length(y))
stop("'x' and 'y' must have the same length")
rx_revmap <- mcols(reduce(x, with.revmap=TRUE))$revmap
ry_revmap <- mcols(reduce(y, with.revmap=TRUE))$revmap
rx_map <- integer(length(x))
rx_map[unlist(rx_revmap, use.names=FALSE)] <- togroup(rx_revmap)
ry_map <- integer(length(y))
ry_map[unlist(ry_revmap, use.names=FALSE)] <- togroup(ry_revmap)
sm <- IRanges:::selfmatchIntegerPairs(rx_map, ry_map)
ans1 <- unlist(range(split(x, sm)), use.names=FALSE)
ans2 <- unlist(range(split(y, sm)), use.names=FALSE)
list(ans1, ans2)
}
Cheers,
H.
On 04/08/2014 05:37 AM, Michael Lawrence wrote:
> This has got to be one of the most complex problems posed here.
>
> Here is one approach, but it may be completely wrong:
>
> hits_x <- findOverlaps(x)
> hits_y <- findOverlaps(y)
> hits_both <- intersect(hits_x, hits_y)
>
> I think the constraint of mutual overlap in x and y is now satisfied. In
> order to reduce, we have to think of the Hits object as a graph, where
> the connected components are those ranges that need to be reduced.
>
> g <- graph::graphNEL(as.character(seq_len(length(x))),
> split(subjectHits(hits_both), queryHits(hits_both)))
> comp <- unname(RBGL::connectedComp(g)) # unname just for efficiency
>
> That graphNEL constructor might kill your performance? There are other
> ways to build the graph. Now we can form GRangesLists from the connected
> components, and since we know they all overlap, just get the range, and
> unlist them back to GRanges:
>
> ids <- as.integer(unlist(comp))
> reduced_x <- unlist(range(relist(x[ids], comp)))
> reduced_y <- unlist(range(relist(y[ids], comp)))
>
> And combine the result into a DataFrame with the range counts:
>
> answer <- DataFrame(x=reduced_x, y=reduced_y, count=elementLengths(comp))
>
> Hopefully that gets you closer. This hasn't been tested at all.
>
> Michael
>
>
>
> On Mon, Apr 7, 2014 at 11:40 AM, Niu, Liang (NIH/NIEHS) [E]
> <liang.niu at nih.gov <mailto:liang.niu at nih.gov>> wrote:
>
> Dear Herve,
>
> I have a question: suppose that I have two GRanges of the same
> length, say x and y, where x[i] and y[i] are two genomic ranges that
> are paired. What I want to do is to reduce x and y simultaneously,
> i.e., combine (x[i],y[i]) and (x[j],y[j]) when x[i] overlaps with
> x[j] and y[i] overlaps with y[j]. Ideally, I would like an output in
> the following format:
>
> range_x range_y count
> chr?:???-??? chr?:???-??? ?
>
> where the range_x and range_y are the reduced ranges, and count is
> the number of records associated with the chr?:???-??? and
> chr?:???-???. How can I do this?
>
> Thanks!
>
> Liang
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list