[Rd] unique.matrix issue [Was: Anomaly with unique and match]

Petr Savicky savicky at cs.cas.cz
Thu Mar 10 09:29:10 CET 2011


On Wed, Mar 09, 2011 at 02:11:49PM -0500, Simon Urbanek wrote:
> match() is a red herring here -- it is really a very specific thing that has to do with the fact that you're running unique() on a matrix. Also it's much easier to reproduce:
> 
> > x=c(1,1+0.2e-15)
> > x
> [1] 1 1
> > sprintf("%a",x)
> [1] "0x1p+0"               "0x1.0000000000001p+0"
> > unique(x)
> [1] 1 1
> > sprintf("%a",unique(x))
> [1] "0x1p+0"               "0x1.0000000000001p+0"
> > unique(matrix(x,2))
>      [,1]
> [1,]    1
>  
> and this comes from the fact that unique.matrix uses string representation since it has to take into account all values of a row/column so it pastes all values into one string, but for the two numbers that is the same:
> > as.character(x)
> [1] "1" "1"

I understand the use of match() in the original message by Terry Therneau
as an example of a situation, where the behavior of unique.matrix() becomes
visible even without looking at the last bits of the numbers.

Let me suggest to consider the following example.

  x <- 1 + c(1.1, 1.3, 1.7, 1.9)*1e-14
  a <- cbind(rep(x, each=2), 2)
  rownames(a) <- 1:nrow(a)

The correct set of rows may be obtained using

  unique(a - 1)

            [,1] [,2]
  1 1.110223e-14    1
  3 1.310063e-14    1
  5 1.709743e-14    1
  7 1.909584e-14    1

However, due to the use of paste(), which uses as.character(), in
unique.matrix(), we also have

  unique(a)

    [,1] [,2]
  1    1    2
  5    1    2

Let me suggest to consider a transformation of the numeric columns
by rank() before the use of paste(). For example

  unique.mat <- function(a)
  {
      temp <- apply(a, 2, rank, ties.method="max")
      temp <- apply(temp, 1, function(x) paste(x, collapse = "\r"))
      a[!duplicated(temp), , drop=FALSE]
  }

  unique.mat(a)

    [,1] [,2]
  1    1    2
  3    1    2
  5    1    2
  7    1    2

Petr Savicky.



More information about the R-devel mailing list