[R] Comparing matrices in R - matrixB %in% matrixA

John Fox jfox at mcmaster.ca
Fri Oct 31 15:40:10 CET 2014


Dear Jeff,

For curiosity, I compared your solution with the one I posted earlier this morning (when I was working on a slower computer, accounting for the somewhat different timings for my solution):

------------ snip ----------

> A <- matrix(1:10000, 10000, 10)
> B <- A[1:1000, ]
> 
> system.time({
+    AA <- as.list(as.data.frame(t(A)))
+    BB <- as.list(as.data.frame(t(B)))
+    print(sum(AA %in% BB))
+  })
[1] 1000
   user  system elapsed 
   0.14    0.01    0.16 
> 
> 
> system.time({
+     lresult <- rep( NA, nrow(A) )
+     for ( ia in seq.int( nrow( A ) ) ) {
+         lres <- FALSE
+         ib <- 0
+         while ( ib < nrow( B ) & !lres ) {
+             ib <- ib + 1
+             lres <- all( A[ ia, ] == B[ ib, ] )
+         }
+         lresult[ ia ] <- lres
+     }
+     print(sum( lresult ))
+ })
[1] 1000
   user  system elapsed 
  45.76    0.01   45.77 
> 46/0.16
[1] 287.5

------------ snip ----------

So the solution using nested loops is more than 2 orders of magnitude slower for this problem. Of course, for a one-off problem, depending on its size, the difference may not matter.

Best,
 John

-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/



> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Jeff Newmiller
> Sent: Friday, October 31, 2014 10:15 AM
> To: Charles Novaes de Santana; r-help at r-project.org
> Subject: Re: [R] Comparing matrices in R - matrixB %in% matrixA
> 
> Thank you for the reproducible example, but posting in HTML can corrupt
> your example code so please learn to set your email client mail format
> appropriately when posting to this list.
> 
> I think this [1] post, found with a quick Google search for "R match
> matrix", fits your situation perfectly.
> 
> match(data.frame(t(B)), data.frame(t(A)))
> 
> Note that concatenating vectors in loops is bad news... a basic
> optimization for your code would be to preallocate a logical result
> vector and fill in each element with a TRUE/FALSE in the outer loop,
> and use the which() function on that completed vector to identify the
> index numbers (if you really need that). For example:
> 
> lresult <- rep( NA, nrow(A) )
> for ( ia in seq.int( nrow( A ) ) ) {
>   lres <- FALSE
>   ib <- 0
>   while ( ib < nrow( B ) & !lres ) {
>     ib <- ib + 1
>     lres <- all( A[ ia, ] == B[ ib, ] )
>   }
>   lresult[ ia ] <- lres
> }
> result <- which( lresult )
> 
> [1] http://stackoverflow.com/questions/12697122/in-r-match-function-
> for-rows-or-columns-of-matrix
> -----------------------------------------------------------------------
> ----
> Jeff Newmiller                        The     .....       .....  Go
> Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..
> Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.
> rocks...1k
> -----------------------------------------------------------------------
> ----
> Sent from my phone. Please excuse my brevity.
> 
> On October 31, 2014 6:20:38 AM PDT, Charles Novaes de Santana
> <charles.santana at gmail.com> wrote:
> >My apologies, because I sent the message before finishing it. i am
> very
> >sorry about this. Please find below my message (I use to write the
> >messages
> >from the end to the beginning... sorry :)).
> >
> >Dear all,
> >
> >I am trying to compare two matrices, in order to find in which rows of
> >a
> >matrix A I can find the same values as in matrix B. I am trying to do
> >it
> >for matrices with around 2500 elements, but please find below a toy
> >example:
> >
> >A = matrix(1:10,nrow=5)
> >B = A[-c(1,2,3),];
> >
> >So
> >> A
> >     [,1] [,2]
> >[1,]    1    6
> >[2,]    2    7
> >[3,]    3    8
> >[4,]    4    9
> >[5,]    5   10
> >
> >and
> >> B
> >     [,1] [,2]
> >[1,]    4    9
> >[2,]    5   10
> >
> >I would like to compare A and B in order to find in which rows of A I
> >can
> >find the  rows of B. Something similar to %in% with one dimensional
> >arrays.
> >In the example above, the answer should be 4 and 5.
> >
> >I did a function to do it (see it below), it gives me the correct
> >answer
> >for this toy example, but the excess of for-loops makes it extremely
> >slow
> >for larger matrices. I was wondering if there is a better way to do
> >this
> >kind of comparison. Any idea? Sorry if it is a stupid question.
> >
> >matbinmata<-function(B,A){
> >    res<-c();
> >    rowsB = length(B[,1]);
> >    rowsA = length(A[,1]);
> >    colsB = length(B[1,]);
> >    colsA = length(A[1,]);
> >    for (i in 1:rowsB){
> >        for (j in 1:colsB){
> >            for (k in 1:rowsA){
> >                for (l in 1:colsA){
> >                    if(A[k,l]==B[i,j]){res<-c(res,k);}
> >                }
> >            }
> >        }
> >    }
> >    return(unique(sort(res)));
> >}
> >
> >
> >Best,
> >
> >Charles
> >
> >On Fri, Oct 31, 2014 at 2:12 PM, Charles Novaes de Santana <
> >charles.santana at gmail.com> wrote:
> >
> >> A = matrix(1:10,nrow=5)
> >> B = A[-c(1,2,3),];
> >>
> >> So
> >> > A
> >>      [,1] [,2]
> >> [1,]    1    6
> >> [2,]    2    7
> >> [3,]    3    8
> >> [4,]    4    9
> >> [5,]    5   10
> >>
> >> and
> >> > B
> >>      [,1] [,2]
> >> [1,]    4    9
> >> [2,]    5   10
> >>
> >> I would like to compare A and B in order to find in which rows of A
> I
> >can
> >> find the  rows of B. Something similar to %in% with one dimensional
> >arrays.
> >> In the example above, the answer should be 4 and 5.
> >>
> >> I did a function to do it (see it below), it gives me the correct
> >answer
> >> for this toy example, but the excess of for-loops makes it extremely
> >slow
> >> for larger matrices. I was wondering if there is a better way to do
> >this
> >> kind of comparison. Any idea? Sorry if it is a stupid question.
> >>
> >> matbinmata<-function(B,A){
> >>     res<-c();
> >>     rowsB = length(B[,1]);
> >>     rowsA = length(A[,1]);
> >>     colsB = length(B[1,]);
> >>     colsA = length(A[1,]);
> >>     for (i in 1:rowsB){
> >>         for (j in 1:colsB){
> >>             for (k in 1:rowsA){
> >>                 for (l in 1:colsA){
> >>                     if(A[k,l]==B[i,j]){res<-c(res,k);}
> >>                 }
> >>             }
> >>         }
> >>     }
> >>     return(unique(sort(res)));
> >> }
> >>
> >>
> >> Best,
> >>
> >> Charles
> >>
> >>
> >> --
> >> Um axé! :)
> >>
> >> --
> >> Charles Novaes de Santana, PhD
> >> http://www.imedea.uib-csic.es/~charles
> >>
> >
> >
> >
> >--
> >Um axé! :)
> >
> >--
> >Charles Novaes de Santana, PhD
> >http://www.imedea.uib-csic.es/~charles
> >
> >	[[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list