[R] Need a vectorized way to avoid two nested FOR loops

jim holtman jholtman at gmail.com
Thu Oct 8 14:24:50 CEST 2009


I answered the wrong question.  Here is the code to find all the
matches for each row:

n <- 20
set.seed(2)
# create test dataframe
x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
x
x.col <- c(1,3,5)

# match against all the other rows
x.match1 <- apply(x[, x.col], 1, function(a){
    .mat <- which(apply(x[, x.col], 1, function(z){
        all(a == z)
    }))
})

# remove matches to itself
x.match2 <- lapply(seq(length(x.match1)), function(z){
    x.match1[[z]][!(x.match1[[z]] %in% z)]
})
# x.match2 contains which rows indices match










On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu> wrote:
>
> Hi Friends,
>
> I have a data frame d. Let vars be the column indices for a subset of the
> columns in d (e.g., vars <- c(1,3,4,8))
>
> For each row r in d, I want to collect all the other rows in d that match
> the values in row r for just the columns in vars.
>
> The naive way to do this is to have a for loop stepping through each row in
> d, and within the loop have another loop going through all the rows again,
> checking for equality. This is quadratic in the number of rows and takes way
> too long. Is there a better, "vectorized" way to do this?
>
> Thanks in advance!
>
> Rama Ramakrishnan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list