[Rd] duplicates() function

Fri Apr 8 17:13:26 CEST 2011

On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> I need a function which is similar to duplicated(), but instead of returning
> TRUE/FALSE, returns indices of which element was duplicated.  That is,
>
>> x <- c(9,7,9,3,7)
>> duplicated(x)
> [1] FALSE FALSE  TRUE FALSE TRUE
>
>> duplicates(x)
> [1] NA NA  1 NA  2
>
> (so that I know that element 3 is a duplicate of element 1, and element 5 is
> a duplicate of element 2, whereas the others were not duplicated according
> to our definition.)
>
> Is there a simple way to write this function?  I have  an ugly
> implementation in R that loops over all the values; it would make more sense
> to redo it in C, if there isn't a simple implementation I missed.

I'd think of making it a lookup table.  The basic idea is

split(seq_along(x), x)

but there are probably much faster ways of doing it, depending on what
you need.  But for efficiency, you probably need a hashtable
somewhere.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/