[R] counting the occurrences of vectors

Marc Schwartz MSchwartz at MedAnalytics.com
Sat Jul 3 17:50:14 CEST 2004


On Sat, 2004-07-03 at 09:31, Ravi Varadhan wrote:
> Hi:
>  
> I have two matrices, A and B, where A is n x k, and B is m x k, where
> n >> m >> k.  Is there a computationally fast way to count the number
> of times each row (a k-vector) of B occurs in A?  Thanks for any
> suggestions.
>  
> Best,
> Ravi. 

How about something like this:

row.match <- function(m1, m2)
{
  if (ncol(m1) != (ncol(m2)))
    stop("Matrices must have the same number of columns")

  m1.l <- apply(m1, 1, list)
  m2.l <- apply(m2 ,1, list)

  # return boolean for m1.l in m2.l
  m1.l %in% m2.l
}


Example of use:

m <- matrix(1:20, ncol = 4, byrow = TRUE)
n <- matrix(1:40, ncol = 4, byrow = TRUE)

> m
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20

> n
      [,1] [,2] [,3] [,4]
 [1,]    1    2    3    4
 [2,]    5    6    7    8
 [3,]    9   10   11   12
 [4,]   13   14   15   16
 [5,]   17   18   19   20
 [6,]   21   22   23   24
 [7,]   25   26   27   28
 [8,]   29   30   31   32
 [9,]   33   34   35   36
[10,]   37   38   39   40

> row.match(n, m)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

If you want to know which rows from n are matches:

> n[row.match(n, m), ]
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20

and if you just want the indices from n:

> which(row.match(n, m))
[1] 1 2 3 4 5



For timing, if I create some large matrices:

> m <- matrix(1:20000, ncol = 4, byrow = TRUE)
> nrow(m)
[1] 5000

> n <- matrix(1:40000, ncol = 4, byrow = TRUE)
> nrow(n)
[1] 10000

> system.time(row.match(n, m))
[1] 0.39 0.01 0.41 0.00 0.00

> length(row.match(n, m))
[1] 10000


Does that get you what you want?

HTH,

Marc Schwartz




More information about the R-help mailing list