[R] which rows are duplicates?

Aaron M. Swoboda aaron.swoboda at gmail.com
Mon Mar 30 06:07:24 CEST 2009


I would like to know which rows are duplicates of each other, not  
simply that a row is duplicate of another row. In the following  
example rows 1 and 3 are duplicates.

 > x <- c(1,3,1)
 > y <- c(2,4,2)
 > z <- c(3,4,3)
 > data <- data.frame(x,y,z)
     x y z
1 1 2 3
2 3 4 4
3 1 2 3

I can't figure out how to get R to tell me that observation 1 and 3  
are the same.  It seems like the "duplicated" and "unique" functions  
should be able to help me out, but I am stumped.

For instance, if I use "duplicated" ...

 > duplicated(data)
[1] FALSE FALSE TRUE

it tells me that row 3 is a duplicate, but not which row it matches.  
How do I figure out WHICH row it matches?

And If I use "unique"...

 > unique(data)
     x y z
1 1 2 3
2 3 4 4

I see that rows 1 and 2 are unique, leaving me to infer that row 3 was  
a duplicate, but again it doesn't tell me which row it was a duplicate  
of (as far as I can tell). Am I missing something?

How can I determine that row 3 is a duplicate OF ROW 1?

Thanks,

Aaron




More information about the R-help mailing list