as.numeric(<factor>) [Difference R/S]

Kurt Hornik Kurt.Hornik@ci.tuwien.ac.at
Wed, 21 Jan 1998 09:59:27 +0100


>>>>> Peter Dalgaard BSA writes:

> Martin Maechler <maechler@stat.math.ethz.ch> writes:
>> 
>> From  R-core;  this should interest most R-devel'ers (to some extent):
>> 
>> Since 0.60,  the semantics of   as.numeric(<factor>)  has changed,
>> e.g.
>> 
R> as.integer(factor(c("A","BB")))
>> [1] NA NA
R> as.integer(factor(c(100,40,100)))
>> [1] 100  40 100
>> 
>> whereas older R and S:
>> 
S> as.integer(factor(c("A","BB")))
>> [1] 1 2
S> as.integer(factor(c(100,40,100)))
>> [1] 2 1 2
>> 
> ...
>> 
>> Hmm,  I first had advocated your view above, myself.
>> 
>> Later, I started to discover in how much S-code
>> as.numeric(ff) 
>> is just used to extract the factor codes (in {1:M})  from a factor.
>> 
>> This lead me (and Peter Dalgaard, I think) to the conclusion that
>> - yes, the present R behavior maybe ``cleaner'' than S's
>> - no, it is a pain to keep it, because it breaks S code too often.
>> 
>> However, as you see, we haven't agreed yet on the topic.
>> I think we should agree ASAP, since it involves code in several places
>> (outside R base).

> Actually, I'm even stronger in favour of the S semantics. In addition
> to the above

> 	- you can always get current behaviour with
>         as.numeric(as.character(f)) or as.numeric(levels(f))[f]

> 	- one should avoid generating NA's unless absolutely necessary

Right :-)

> 	- when a factor is used for subscripting, you mean the codes,
>         not the levels. Currently, we have

>> (1:5)[factor(1:5,labels=5:1)]
> [1] 1 2 3 4 5

> but

>> as.numeric(factor(1:5,labels=5:1))
> [1] 5 4 3 2 1

> 	I.e. *sometimes* when a factor is coerced to numeric you get
> 	something different. (And if you change the index semantics,
> 	code for trend tests and the like is likely to break!).

But that is really a matter of how subscripting treats factors, and not
necessarily what coercion does.

As much as I am in favor of compatibility (remember I do a lot of
porting):

* Suppose f is a factor with numeric levels other than 1 to n.  Then
as.numeric(f) returning the codes rather than the levels is strange.

* You also cannot coerce a character vector to numeric without getting
NA's.

Btw:

	x <- factor(c(10, 5, 6, 7))

Then levels(x) gives the CHARACTER vector c("5", "6", "7", "10") [in
both R and S+], why that?

And:

R> codes(x)
[1] 4 1 2 3

S> codes(x)
[1] 1 2 3 4

???
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._