[R] Need a vectorized way to avoid two nested FOR loops

jim holtman jholtman at gmail.com
Thu Oct 8 14:04:34 CEST 2009


Here is one way of doing it:

> n <- 20
> set.seed(2)
> # create test dataframe
> x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
> x
   V1 V2 V3 V4 V5 V6
1   1  2  2  2  1  1
2   2  1  1  2  2  1
3   2  2  1  2  1  2
4   1  1  1  1  1  2
5   2  1  2  2  1  1
6   2  1  2  1  2  2
7   1  1  2  1  2  2
8   2  1  1  1  1  1
9   1  2  2  1  2  1
10  2  1  2  1  1  1
11  2  1  1  1  2  1
12  1  1  1  1  1  2
13  2  2  2  1  1  1
14  1  2  2  1  2  2
15  1  2  1  1  1  2
16  2  2  2  2  1  2
17  2  2  2  1  1  2
18  1  1  2  2  1  1
19  1  2  2  1  1  2
20  1  1  2  2  1  2
> x.col <- c(1,3,5)
> # find matching columns by testing the first against all others
> x.match <- x[, x.col[1]] == x[, x.col[-1]]
> # print them out
> x[apply(x.match, 1, all),]
   V1 V2 V3 V4 V5 V6
4   1  1  1  1  1  2
6   2  1  2  1  2  2
12  1  1  1  1  1  2
15  1  2  1  1  1  2
>
>
>


On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu> wrote:
>
> Hi Friends,
>
> I have a data frame d. Let vars be the column indices for a subset of the
> columns in d (e.g., vars <- c(1,3,4,8))
>
> For each row r in d, I want to collect all the other rows in d that match
> the values in row r for just the columns in vars.
>
> The naive way to do this is to have a for loop stepping through each row in
> d, and within the loop have another loop going through all the rows again,
> checking for equality. This is quadratic in the number of rows and takes way
> too long. Is there a better, "vectorized" way to do this?
>
> Thanks in advance!
>
> Rama Ramakrishnan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list