as.numeric(<factor>) [Difference R/S]

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Tue, 20 Jan 1998 09:37:53 +0100


>From  R-core;  this should interest most R-devel'ers (to some extent):

Since 0.60,  the semantics of   as.numeric(<factor>)  has changed,
e.g.

R> as.integer(factor(c("A","BB")))
[1] NA NA
R> as.integer(factor(c(100,40,100)))
[1] 100  40 100

whereas older R and S:

S> as.integer(factor(c("A","BB")))
[1] 1 2
S> as.integer(factor(c(100,40,100)))
[1] 2 1 2


-------------------------------------
as explained by Ross, below :

>>>>> "KH" == Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> writes:

>>>>> Ross Ihaka writes:
    KH>> From hornik@ci.tuwien.ac.at Mon Jan 19 22:52 NZD 1998 Subject:
    KH>> Difference R/S
    KH>> 
    KH>> Andreas just pointed me to the following:
    KH>> 
    KH>> v <- as.factor(c("Age","Number","Age")) as.numeric(v)
    KH>> 
    KH>> gives
    KH>> 
    KH>> [1] 1 2 1
    KH>> 
    KH>> in S+ and
    KH>> 
    KH>> [1] NA NA NA
    KH>> 
    KH>> Bug/feature/intentional?
    KH>> 
    KH>> Of course, R makes more sense because as.numeric("Age") gives NA in
    KH>> both R and S+ ...
    KH>> 
    KH>> Or, should we have as.numeric() return the codes on a non-numeric
    KH>> factor?

    Ross> At present R (implicitly) computes as.numeric(x) for x a factor as

    Ross> 	as.numeric(as.character(x))

    Ross> and S computes

    Ross> 	codes(x)

    Ross> I mistakenly thought that S does what I have implemented for R.
    Ross> Thomas first objected to the difference and then said he quite liked
    Ross> it.

    Ross> I quite like the present semantics, but it is easy to change if
    Ross> others have different preferences.

    KH> I personally think that the current R approach makes more sense,
    KH> too.  If we all agree on it, I would like to add the difference to
    KH> the FAQ, so that it is (well) documented.

Hmm,  I first had advocated your view above, myself.

Later, I started to discover in how much S-code
	as.numeric(ff) 
is just used to extract the factor codes (in {1:M})  from a factor.

This lead me (and Peter Dalgaard, I think) to the conclusion that
- yes, the present R behavior maybe ``cleaner'' than S's
- no, it is a pain to keep it, because it breaks S code too often.

However, as you see, we haven't agreed yet on the topic.
I think we should agree ASAP, since it involves code in several places
(outside R base).

Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._