[R] Why do we have to turn factors into characters for various functions?

Erik Iverson eriki at ccbr.umn.edu
Sun Dec 12 19:16:40 CET 2010


On 12/11/2010 04:48 PM, Tal Galili wrote:
> Hello dear R-help mailing list,
>
> My question is *not* about how factors are implemented in R (which is, if I
> understand correctly, that factors keeps numbers and assign levels to them).
> My question *is* about why so many functions that work on factors don't
> treat them as characters by default?
>
> Here are two simple examples:
> Example one turning the characters inside a factor into numeric:
>
> x<- factor(4:6)
> as.numeric(x) # output: 1 2 3
> as.numeric(as.character(x)) # output: 4 5 6  # isn't this what we wanted?

But your example of 'x' is a very special case.  Most factors will
not have numeric levels as you have constructed. Most levels will
be categorical such as Sex, Race, Country of Origin, Treatment, etc.

These are stored as numeric codes (R's enumerated type class), and
most modeling functions treat variables of class factor differently.

So, as.numeric(x) will just return the numeric codes regardless of
the levels of the factor, which is fine.  It seems you may be silently
suggesting that *if* the levels of the factor are themselves able to
be coerced to numeric, then as.numeric(x) should return that instead
of the underlying numeric codes.

Of course, having functions do different things depending on the
particular input is dangerous, thus we have the behavior as it
is currently implemented.



More information about the R-help mailing list