# [Rd] Proposal: Generalizing unique() and duplicated()

Kaspar Pflugshaupt pflugshaupt@geobot.umnw.ethz.ch
Tue, 6 Feb 2001 11:25:39 +0100

```Prof. Ripley wrote on r-help:

> Completely distinct row vectors?  Take a look at the code of
> merge.data.frame.  Something like
>
>     bx <- matrix(as.character(a), nrow(a))
>     bx <- drop(apply(bx, 1, function(x) paste(x, collapse = "\r")))
>     length(unique(bx))
>
> This turns each row into a single character string, and counts the unique
> ones.

Hmmm... couldn't one build on this in order to generalize the
unique() function?

I'm asking because when I once tried to use unique() on a matrix (to collapse
duplicate rows), I found that it and duplicated() work only on vectors. I
think a generalization, at least for matrices and simple data.frames, would
be useful.

I tried my hand at it and came up with this:

----------------------------------------------------

"unique.default" <- get("unique", pos="package:base")    # old version becomes
# default behaviour
"unique" <- function(object, ...)
{
if (data.class(object)=="matrix")
return(unique.matrix(object, ...))
else
UseMethod("unique")      # doesn't seem to work for matrices, hence
}                               # the condition

"duplicated.default" <- get("duplicated", pos="package:base")

"duplicated" <- function(object, ...)
{
if (data.class(object)=="matrix")
return(duplicated.matrix(object, ...))
else
UseMethod("duplicated")
}

"duplicated.matrix" <-
function(mat, MARGIN=1)    # defaulting to work on rows
{
strvect <- drop(apply(mat, MARGIN, function(x) paste(x, collapse = "\r")))
return(duplicated(strvect))
}

"unique.matrix" <-
function(mat, MARGIN=1)    # defaulting to work on rows
{
dup <- duplicated(mat, MARGIN)
return(if (MARGIN==1) mat[!dup,] else mat[,!dup])
}

"duplicated.data.frame" <-
function(df, MARGIN=1)
{
strvect <- drop(apply(as.matrix(df), MARGIN, function(x) paste(x, collapse
= "\r")))
duplicated(strvect)
}

"unique.data.frame" <-
function(df, MARGIN=1)
{
dup <- duplicated(df, MARGIN)
return(if (MARGIN==1) df[!dup,] else df[,!dup])
}

----------------------------------------------------

I couldn't figure out how to generalize to more than two dimensions (more
accurately, how to subset in the dimension given by the variable MARGIN).

Does anybody else consider this useful?

Cheers

Kaspar Pflugshaupt
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```