as.numeric(<factor>) [Difference R/S]

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
21 Jan 1998 11:39:59 +0100


Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> writes:

> But that is really a matter of how subscripting treats factors, and not
> necessarily what coercion does.

Well, the point was that one could say that there's an implicit
coercion involved in indexing. (Note BTW that in general
A[as.numeric(as.character(f))] != 
A[as.character(f)] !=
A[codes(f)] == A[f])

> As much as I am in favor of compatibility (remember I do a lot of
> porting):
> 
> * Suppose f is a factor with numeric levels other than 1 to n.  Then
> as.numeric(f) returning the codes rather than the levels is strange.

No, it's not. It may come as a surprise to some, but it might as well
be thought strange that the internal numeric codes are not available
via as.numeric. The levels are character strings that may or may not
happen to be convertible to numbers, so why should you expect that the
general procedure assumes that they are convertible?

> * You also cannot coerce a character vector to numeric without getting
> NA's.

This is true, but in that case, there's no obvious alternative.

> Btw:
> 
> 	x <- factor(c(10, 5, 6, 7))
> 
> Then levels(x) gives the CHARACTER vector c("5", "6", "7", "10") [in
> both R and S+], why that?

By definition, a factor is an integer vector of codes coupled with a
character vector of levels. Where's the problem? We could of course
introduce the possibility of having levels vectors of any type (and
take the pains arising from differences between factor(1:3) and
factor(1:3,labels=as.character(1:3)) ). 

> 
> And:
> 
> R> codes(x)
> [1] 4 1 2 3
> 
> S> codes(x)
> [1] 1 2 3 4
> 
> ???

Apparently R is defaulting levels=as.character(sort(unique(x)),
whereas S is doing levels=sort(as.character(unique(x))), so that 10
sorts alphabetically before 5,6,7...

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._