[R] Intersecting two matrices

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Tue Jul 30 19:36:09 CEST 2013


In that case, you should be looking at a relational inner join, perhaps with SQLite (see package sqldf).
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

c char <charlie.hsia.us at gmail.com> wrote:
>Thanks a lot.
>Still looking for some super fast and memory efficient solution, as the
>matrix I have in real world has billions of rows.
>
>
>On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdunlap at tibco.com>
>wrote:
>
>> I haven't looked at the size-time relationship, but im2 (below) is
>faster
>> than your
>> function on at least one example:
>>
>> intersectMat <- function(mat1, mat2)
>> {
>>     #mat1 and mat2 are both deduplicated
>>     nr1 <- nrow(mat1)
>>     nr2 <- nrow(mat2)
>>     mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
>> drop=FALSE]
>> }
>>
>> im2 <- function(mat1, mat2)
>> {
>>     stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
>>     toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
>> twoColMat[,2])
>>     mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
>> }
>>
>> > m1 <- cbind(1:1e7, rep(1:10, len=1e7))
>> > m2 <- cbind(1:1e7, rep(1:20, len=1e7))
>> > system.time(r1 <- intersectMat(m1,m2))
>>    user  system elapsed
>>  430.37    1.96  433.98
>> > system.time(r2 <- im2(m1,m2))
>>    user  system elapsed
>>   27.89    0.20   28.13
>> > identical(r1, r2)
>> [1] TRUE
>> > dim(r1)
>> [1] 5000000       2
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>[mailto:r-help-bounces at r-project.org]
>> On Behalf
>> > Of c char
>> > Sent: Monday, July 29, 2013 4:04 PM
>> > To: r-help at r-project.org
>> > Subject: [R] Intersecting two matrices
>> >
>> > Dear all,
>> >
>> > I am interested to know a faster matrix intersection package for R
>> handles
>> > intersection of two integer matrices with ncol=2. Currently I am
>using my
>> > homemade code adapted from a previous thread:
>> >
>> >
>> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
>> > deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
>> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
>> >
>> >
>> > which handles:
>> > size A= 10578373
>> > size B= 9519807
>> > expected intersecting time= 251.2272
>> > intersecting for corssing MPRs took 409.602 seconds.
>> >
>> > scale a little bit worse than linearly but atomic operation is not
>good.
>> > Wonder if a super fast C/C++ extension exists for this task. Your
>ideas
>> are
>> > appreciated.
>> >
>> > Thanks!
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list