[Rd] Bugs in unique of data.frame and matrix

Stavros Macrakis macrakis at alum.mit.edu
Sat Jun 27 00:59:53 CEST 2009


R version 2.8.1 (2008-12-22) / Windows XP

There are several bugs in unique for data frames and matrices. Please
find minimal reproducible examples below.

          -s

-----A-----

Unique of a vector uses numerical comparison:

> nn <- ((1+2^-52)^(5:22))
> unique(nn)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

While unique of a data frame uses comparison of the 15-digit string:

> unique(data.frame(a=nn))
  a
1 1

Similarly:

> unique(matrix(nn,ncol=1))
     [,1]
[1,]    1

-----B-----

> df <- data.frame(a=c("\r",""),b=c("","\r"))
> unique(df)
   a b
1 \r

> unique(as.matrix(df))
     a    b
[1,] "\r" ""

Though "\r" is no doubt rare in strings, it is perfectly legal.

-----C-----

For vectors and data frames, unique preserves the POSIXct class:

dd <- as.POSIXct('1999-1-1')
> unique(dd)
[1] "1999-01-01 EST"

> unique(data.frame(a=dd))
           a
1 1999-01-01

But for matrices, it converts to the underlying number:

> unique(matrix(dd))
          [,1]
[1,] 915166800

-----workaround-----

The first two bugs can be worked around by converting the matrix to a
list of vectors, calling unique, then converting back:

    library(plyr)
    laply(unique(alply(matrix(nn,ncol=1),1,identity)),identity,.drop=FALSE)
    laply(unique(alply(mm,1,identity)),identity,.drop=FALSE)



More information about the R-devel mailing list