[R] Intersecting two matrices

William Dunlap wdunlap at tibco.com
Tue Jul 30 03:24:26 CEST 2013


I haven't looked at the size-time relationship, but im2 (below) is faster than your
function on at least one example:

intersectMat <- function(mat1, mat2)
{
    #mat1 and mat2 are both deduplicated
    nr1 <- nrow(mat1)
    nr2 <- nrow(mat2)
    mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , drop=FALSE]
}

im2 <- function(mat1, mat2)
{
    stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
    toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], twoColMat[,2])
    mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
}

> m1 <- cbind(1:1e7, rep(1:10, len=1e7))
> m2 <- cbind(1:1e7, rep(1:20, len=1e7))
> system.time(r1 <- intersectMat(m1,m2))
   user  system elapsed 
 430.37    1.96  433.98 
> system.time(r2 <- im2(m1,m2))
   user  system elapsed 
  27.89    0.20   28.13 
> identical(r1, r2)
[1] TRUE
> dim(r1)
[1] 5000000       2

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of c char
> Sent: Monday, July 29, 2013 4:04 PM
> To: r-help at r-project.org
> Subject: [R] Intersecting two matrices
> 
> Dear all,
> 
> I am interested to know a faster matrix intersection package for R handles
> intersection of two integer matrices with ncol=2. Currently I am using my
> homemade code adapted from a previous thread:
> 
> 
> intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
> deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
> 
> 
> which handles:
> size A= 10578373
> size B= 9519807
> expected intersecting time= 251.2272
> intersecting for corssing MPRs took 409.602 seconds.
> 
> scale a little bit worse than linearly but atomic operation is not good.
> Wonder if a super fast C/C++ extension exists for this task. Your ideas are
> appreciated.
> 
> Thanks!
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list