[R] duplicated function

Tue Nov 18 18:39:18 CET 2014

>>>>> Duncan Murdoch <murdoch.duncan at gmail.com>
>>>>>     on Tue, 18 Nov 2014 10:40:16 -0500 writes:

    > On 18/11/2014 10:23 AM, Dennis Fisher wrote:
    >> R 3.1.1
    >> OS X
    >> 
    >> Colleagues
    >> 
    >> When I use the duplicated function, I often need to find both the duplicates and the original element that was duplicated.  This can be accomplished with:
    >> duplicated(OBJECT) | duplicated(OBJECT, fromLast=TRUE)
    >> 
    >> From my perspective, an improvement in the duplicated function would be an option that accomplishes this with a single call to the function.  This could either be:
    >> 1.  a new option: all=TRUE (pick whatever name makes sense)
    >> 2.  allowing fromLast to take a new value (e.g., NA, in the spirit of the xpd option in par())
    >> 
    >> If my suggestion would yield unintended consequences, it can certainly be ignored.

    > The duplicated() function is pretty fast, so what's wrong with your 
    > original version?  If you find it to be too much typing, wouldn't it be 
    > simplest to write your own function, e.g.

    > nonunique <- function(x) duplicated(x) | duplicated(x, fromLast=TRUE)

    > ?

    > Something I've wanted more than once is a variation on duplicated that 
    > returns the index of the duplicated element, so for example

    > dupindex(c(7,7,7,2,3,2))

    > would return

    > 0 1 1 0 0 4

    > or possibly

    > 1 1 1 4 5 4

    > Duncan Murdoch

In our CRAN package  'sfsmisc' (http://cran.r-project.org/web/packages/sfsmisc)
we have had a function Duplicated()
for a while now with the following "feature":

>      x <- c(9:12, 1:4, 3:6, 0:7)
>      data.frame(x, dup = duplicated(x),
+                    dupL= duplicated(x, fromLast=TRUE),
+                    Dup = Duplicated(x),
+                    DupL= Duplicated(x, fromLast=TRUE))

    x   dup  dupL Dup DupL
1   9 FALSE FALSE  NA   NA
2  10 FALSE FALSE  NA   NA
3  11 FALSE FALSE  NA   NA
4  12 FALSE FALSE  NA   NA
5   1 FALSE  TRUE   3    1
6   2 FALSE  TRUE   4    2
7   3 FALSE  TRUE   1    3
8   4 FALSE  TRUE   2    4
9   3  TRUE  TRUE   1    3
10  4  TRUE  TRUE   2    4
11  5 FALSE  TRUE   7    7
12  6 FALSE  TRUE   8    8
13  0 FALSE FALSE  NA   NA
14  1  TRUE FALSE   3    1
15  2  TRUE FALSE   4    2
16  3  TRUE FALSE   1    3
17  4  TRUE FALSE   2    4
18  5  TRUE FALSE   7    7
19  6  TRUE FALSE   8    8
20  7 FALSE FALSE  NA   NA
> 

---- help page --------------------------------------------------------

Duplicated               package:sfsmisc               R Documentation

Counting-Generalization of duplicated()

Description:

     Duplicated() generalizes the ‘duplicated’ method for vectors, by
     returning indices of “equivalence classes” for duplicated entries
     and returning ‘nomatch’ (‘NA’ by default) for unique entries.

     Note that ‘duplicated()’ is not ‘TRUE’ for the first time a
     duplicate appears, whereas ‘Duplicated()’ only marks unique
     entries with ‘nomatch’ (‘NA’).

Usage:

     Duplicated(v, incomparables = FALSE, fromLast = FALSE, nomatch = NA_integer_)

Arguments:

       v: a vector, often character, factor, or numeric.

incomparables: a vector of values that cannot be compared, passed to
          both ‘duplicated()’ and ‘match()’.  ‘FALSE’ is a special
          value, meaning that all values can be compared, and may be
          the only value accepted for methods other than the default.
          It will be coerced internally to the same type as ‘x’.

fromLast: logical indicating if duplication should be considered from
          the reverse side, i.e., the last (or rightmost) of identical
          elements would correspond to ‘duplicated=FALSE’.

 nomatch: passed to ‘match()’: the value to be returned in the case
          when no match is found.  Note that it is coerced to
          ‘integer’.

Value:

     an integer vector of the same length as ‘v’.  Can be used as a
     ‘factor’, e.g., in ‘split’, ‘tapply’, etc.

Author(s):

     Christoph Buser and Martin Maechler, Seminar fuer Statistik, ETH
     Zurich, Sep.2007