[R] which rows are duplicates?

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Mar 31 14:36:33 CEST 2009


Dimitris Rizopoulos wrote:
> Wacek Kusnierczyk wrote:
>> Wacek Kusnierczyk wrote:
>>> Michael Dewey wrote:
>>>  
>>>> At 05:07 30/03/2009, Aaron M. Swoboda wrote:
>>>>    
>>>>> I would like to know which rows are duplicates of each other, not
>>>>> simply that a row is duplicate of another row. In the following
>>>>> example rows 1 and 3 are duplicates.
>>>>>
>>>>>      
>>>>>> x <- c(1,3,1)
>>>>>> y <- c(2,4,2)
>>>>>> z <- c(3,4,3)
>>>>>> data <- data.frame(x,y,z)
>>>>>>         
>>>>>     x y z
>>>>> 1 1 2 3
>>>>> 2 3 4 4
>>>>> 3 1 2 3
>>>>>       
>>> i don't have any solution significantly better than what you have
>>> already been given.  
>>
>> i now seem to have one:
>>
>>     # dummy data
>>     data = data.frame(x=sample(1:2, 5, replace=TRUE), y=sample(1:2, 5,
>> replace=TRUE))
>>        # add a class column; identical rows have the same class id
>>     data$class = local({
>>         rows = do.call('paste', c(data, sep='\r'))
>>         with(
>>             rle(sort(rows)),
>>             rep(1:length(values), lengths)[rank(rows)] ) })
>>
>>     data
>>     #   x y class
>>     # 1 2 2     3
>>     # 2 2 1     2
>>     # 3 2 1     2
>>     # 4 1 2     1
>>     # 5 2 2     3
>>
>
> another approach (maybe a bit cleaner) seems to be:
>
> data <- data.frame(x=sample(1:2, 5, replace=TRUE), y=sample(1:2, 5,
> replace = TRUE))
>
> vals <- do.call('paste', c(data, sep = '\r'))
> data$class <- match(vals, unique(vals))
> data
>


wow, cool!  this seems unbeatable ;)
i guess it can't be slower than any of the others.

vQ




More information about the R-help mailing list