[R] Matching a vector with a matrix row

Ravi Varadhan rvaradhan at jhmi.edu
Sun Apr 24 19:46:20 CEST 2011


I gave a solution previously with integer elements.  It also works well for real numbers.

rowMatch <- function(A,B) {
# Rows in A that match the rows in B
# The row indexes correspond to A
    f <- function(...) paste(..., sep=":")
   if(!is.matrix(B)) B <- matrix(B, 1, length(B))
    a <- do.call("f", as.data.frame(A))
    b <- do.call("f", as.data.frame(B))
    match(b, a)
}

A <- matrix(rnorm(100000), 5000, 20)
sel <- sample(1:nrow(A), size=100, replace=TRUE)
B <- A[sel,]

system.time(rows <- rowMatch(A, B ))
all.equal(sel, rows)

sel <- sample(1:nrow(A), size=1)
b <- c(A[sel,])
system.time(row <- rowMatch(A, b))
all.equal(sel, row)

I am curious to see if there are better/faster ways to do this.

Ravi.
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Petr Savicky [savicky at praha1.ff.cuni.cz]
Sent: Sunday, April 24, 2011 5:13 AM
To: r-help at r-project.org
Subject: Re: [R] Matching a vector with a matrix row

On Sat, Apr 23, 2011 at 08:56:33AM +0800, Luis Felipe Parra wrote:
> Hello Niels, I am trying to find the rows in Matrix which contain all of the
> elements in LHS.

This sounds like you want an equivalent of

  all(LHS %in% x)

However, in your original post, you used

  all(x %in% LHS)

What is correct?

If the equality of x and LHS should be tested, then try

   setequal(x, LHS)

If the rows may contain repeated elements and the number of
repetitions should also match, then try

  identical(sort(x), sort(LHS))

with a precomputed sort(LHS) for efficiency.

If the number of the different character values in the whole
matrix is not too large, then efficiency of the comparison
may be improved, if the matrix is converted to a matrix
consisting of integer codes instead of the original character
values. See ?factor for the meaning of "integer codes".
After this conversion, the comparison can be done by comparing
integers instead of character values, which is faster.

Hope this helps.

Petr Savicky.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list