[Rd] suggestion for extending ?as.factor

Petr Savicky savicky at cs.cas.cz
Thu May 7 09:20:26 CEST 2009


On Wed, May 06, 2009 at 10:41:58AM +0200, Martin Maechler wrote:
>      PD> I think that the real issue is that we actually do want almost-equal
>      PD> numbers to be folded together. 
> 
> yes, this now (revision 48469) will happen by default, using  signif(x, 15) 
> where '15' is the default for the new optional argument 'digitsLabels'

On some platforms, the function factor() in the current R 2.10.0
(2009-05-06 r48478) may produce duplicated levels. The examples are
in general platform dependent. The following one produces duplicated
(in fact triplicated) levels on both Intel default arithmetic and
on Intel with SSE.

  x <- 9.7738826945424 + c(-1, 0, 1) * 1e-14
  x <- signif(x, 15)
  factor(x)
  # [1] 9.7738826945424 9.7738826945424 9.7738826945424
  # Levels: 9.7738826945424 9.7738826945424 9.7738826945424
  # Warning message:
  # In `levels<-`(`*tmp*`, value = c("9.7738826945424", "9.7738826945424",  :
  #   duplicated levels will not be allowed in factors anymore

The reason is that the three numbers remain different in signif(x, 15),
but are mapped to the same string in as.character(x).

  length(unique(x)) # [1] 3
  length(unique(as.character(x))) # 1

Further examples may be found using

  x <- as.character(9 + runif(5000))
  x <- as.numeric(x[nchar(x)==15]) # select numbers with 14 digits
  x <- signif(cbind(x - 1e-14, x, x + 1e-14), 15)
  y <- array(as.character(x), dim=dim(x))
  x <- x[which(y[,1] == y[,3]),]
  factor(x[1,])

Petr.



More information about the R-devel mailing list