[Rd] suggestion for extending ?as.factor

Petr Savicky savicky at cs.cas.cz
Tue May 5 20:43:33 CEST 2009


On Tue, May 05, 2009 at 11:27:36AM +0200, Peter Dalgaard wrote:
> I know. The point was rather that if you are not careful with rounding,
> you get the some of the bars wrong (you get 2 or 3 small bars very close
> to each other instead of one longer one). Computed p values from
> permutation tests (as in mean(sim>=obs)) also need care for the same reason.

OK. Now, i understand the point of the example. I think that it is
the responsibility of the user to find the right way to eliminate the
influence of the rounding errors, since this may require a different
approach in different situations. However, i can also accept the
point of view that as.factor() should do this to some extent by default.

For example, we may require that as.factor() is consistent with 
as.character() in the way how to map different numbers to the same
string.

At the first glance, one could expect that to implement this, it is 
sufficient if as.factor(x) performs
  x <- as.numeric(as.character(x))
  levels <- as.character(sort(unique(x)))

Unfortunately, on some platforms (tested on Intel with SSE, R-2.10.0,
2009-05-02 r48453), this may produce repeated levels.

  x <- c(0.6807853176681814000304, 0.6807853176681809559412)
  x <- as.numeric(as.character(x))
  levels <- as.character(sort(unique(x)))
  levels # "0.68078531766818" "0.68078531766818"
  levels[1] == levels[2] # TRUE

Using the default Intel arithmetic, we get a single level, namely "0.680785317668181".

Petr.



More information about the R-devel mailing list