[Rd] duplicated() variation that goes both ways to capture all duplicates

Duncan Murdoch murdoch.duncan at gmail.com
Mon Jul 23 15:08:22 CEST 2012


On 23/07/2012 8:49 AM, Liviu Andronic wrote:
> Dear all
> The trouble with the current duplicated() function in is that it can
> report duplicates while searching fromFirst _or_ fromLast, but not
> both ways. Often users will want to identify and extract all the
> copies of the item that has duplicates, not only the duplicates
> themselves.
>
> To take the example from the man page:
> > data(iris)
> > iris[duplicated(iris), ]  ##duplicates while searching "fromFirst"
>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> 143          5.8         2.7          5.1         1.9 virginica
> > iris[duplicated(iris, fromLast=T), ]  ##duplicates while searching "fromLast"
>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> 102          5.8         2.7          5.1         1.9 virginica
>
>
> To extract all the copies of the concerned items ("original" and
> duplicates) one would need to do something like this:
> > iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ]  ##duplicates while searching "bothWays"
>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> 102          5.8         2.7          5.1         1.9 virginica
> 143          5.8         2.7          5.1         1.9 virginica
>
>
> Unfortunately this is unnecessarily long and convoluted. Short of a
> 'bothWays' argument in duplicated(), I came up with a small wrapper
> that simplifies the above:
> duplicated2 <-
>      function(x, bothWays=TRUE, ...)
>      {
>          if(!bothWays) {
>              return(duplicated(x, ...))
>          } else if(bothWays) {
>                  return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, ...)))
>          }
>      }
>
>
> Now the above can be achieved simply via:
> > iris[duplicated2(iris), ]  ##duplicates while searching "bothWays"
>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> 102          5.8         2.7          5.1         1.9 virginica
> 143          5.8         2.7          5.1         1.9 virginica
>
>
> So here's my inquiry: Would the R Core consider adding such
> functionality in 'base' R? Either the---suitably cleaned
> up---duplicated2() function above, or a "bothWays" argument in
> duplicated() itself? Either of the two would improve user convenience
> and reduce confusion. (In my case it took some time before I
> understood the correct approach to this problem.)

I can't speak for all of R core, but I don't see the need for this in 
base R -- your solution looks fine to me.

Duncan Murdoch



More information about the R-devel mailing list