# [R] counting the occurrences of vectors

Marc Schwartz MSchwartz at MedAnalytics.com
Sat Jul 3 17:50:14 CEST 2004

```On Sat, 2004-07-03 at 09:31, Ravi Varadhan wrote:
> Hi:
>
> I have two matrices, A and B, where A is n x k, and B is m x k, where
> n >> m >> k.  Is there a computationally fast way to count the number
> of times each row (a k-vector) of B occurs in A?  Thanks for any
> suggestions.
>
> Best,
> Ravi.

row.match <- function(m1, m2)
{
if (ncol(m1) != (ncol(m2)))
stop("Matrices must have the same number of columns")

m1.l <- apply(m1, 1, list)
m2.l <- apply(m2 ,1, list)

# return boolean for m1.l in m2.l
m1.l %in% m2.l
}

Example of use:

m <- matrix(1:20, ncol = 4, byrow = TRUE)
n <- matrix(1:40, ncol = 4, byrow = TRUE)

> m
[,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20

> n
[,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20
[6,]   21   22   23   24
[7,]   25   26   27   28
[8,]   29   30   31   32
[9,]   33   34   35   36
[10,]   37   38   39   40

> row.match(n, m)
  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

If you want to know which rows from n are matches:

> n[row.match(n, m), ]
[,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20

and if you just want the indices from n:

> which(row.match(n, m))
 1 2 3 4 5

For timing, if I create some large matrices:

> m <- matrix(1:20000, ncol = 4, byrow = TRUE)
> nrow(m)
 5000

> n <- matrix(1:40000, ncol = 4, byrow = TRUE)
> nrow(n)
 10000

> system.time(row.match(n, m))
 0.39 0.01 0.41 0.00 0.00

> length(row.match(n, m))
 10000

Does that get you what you want?

HTH,

Marc Schwartz

```