[Rd] duplicates() function

Duncan Murdoch murdoch.duncan at gmail.com
Mon Apr 11 20:05:11 CEST 2011


On 08/04/2011 11:39 AM, Joshua Ulrich wrote:
> On Fri, Apr 8, 2011 at 10:15 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com>  wrote:
> >  On 08/04/2011 11:08 AM, Joshua Ulrich wrote:
> >>
> >>  How about:
> >>
> >>  y<- rep(NA,length(x))
> >>  y[duplicated(x)]<- match(x[duplicated(x)] ,x)
> >
> >  That's a nice solution for vectors.  Unfortunately for me, I have a matrix
> >  (which duplicated() handles by checking whole rows).  So a better example
> >  that I should have posted would be
> >
> >  x<-  cbind(1, c(9,7,9,3,7) )
> >
> >  and I'd still like the same output
> >
> For a matrix, could you apply the same strategy used in duplicated()?
>
> y<- rep(NA,NROW(x))
> temp<- apply(x, 1, function(x) paste(x, collapse="\r"))
> y[duplicated(temp)]<- match(temp[duplicated(temp)], temp)

Since this thread hasn't ended, I will say that I think this solution is 
the best I've seen for my specific problem.  I was actually surprised 
that duplicated() did the string concatenation trick, but since it does, 
it makes a lot of sense to do the same in duplicates().

I think a good general purpose solution that worked wherever 
duplicated() works would likely be harder, because we don't really have 
the right primitives to make it work.

Duncan Murdoch
> >>    duplicated(x)
> >
> >  [1] FALSE FALSE  TRUE FALSE TRUE
> >
> >>    duplicates(x)
> >
> >  [1] NA NA  1 NA  2
> >
> >
> >  Duncan Murdoch
> >
> >>  --
> >>  Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
> >>
> >>
> >>
> >>  On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch<murdoch.duncan at gmail.com>
> >>    wrote:
> >>  >    I need a function which is similar to duplicated(), but instead of
> >>  >  returning
> >>  >    TRUE/FALSE, returns indices of which element was duplicated.  That is,
> >>  >
> >>  >>    x<- c(9,7,9,3,7)
> >>  >>    duplicated(x)
> >>  >    [1] FALSE FALSE  TRUE FALSE TRUE
> >>  >
> >>  >>    duplicates(x)
> >>  >    [1] NA NA  1 NA  2
> >>  >
> >>  >    (so that I know that element 3 is a duplicate of element 1, and element
> >>  >  5 is
> >>  >    a duplicate of element 2, whereas the others were not duplicated
> >>  >  according
> >>  >    to our definition.)
> >>  >
> >>  >    Is there a simple way to write this function?  I have  an ugly
> >>  >    implementation in R that loops over all the values; it would make more
> >>  >  sense
> >>  >    to redo it in C, if there isn't a simple implementation I missed.
> >>  >
> >>  >    Duncan Murdoch
> >>  >
> >>  >    ______________________________________________
> >>  >    R-devel at r-project.org mailing list
> >>  >    https://stat.ethz.ch/mailman/listinfo/r-devel
> >>  >
> >
> >



More information about the R-devel mailing list