[R] Surprise when indexing with a factor.

pallier pallier at lscp.ehess.fr
Sat May 8 15:31:40 CEST 2004


>It may be educational to read ?factor before you use a factor for some
>operation (such as subscripting), I guess.  In part, it says:
>
>Value:
>
>     'factor' returns an object of class '"factor"' which has a set of
>     numeric codes the length of 'x' with a '"levels"' attribute of
>     mode 'character'.  If 'ordered' is true (or 'ordered' is used) the
>     result has class 'c("ordered", "factor")'.
>
>In other words, a factor is a numeric vector with a "levels" attribute.
>What do you expect to happen when you use a numeric vector as subscript?
>  
>

Hello,

Ok, I understand the point: I should have read the documentation better...

The 'warning' section of the help on 'factor' is even more enlightning:

 >    The interpretation of a factor depends on both the codes and the
 >    `"levels"' attribute.  Be careful only to compare factors with the
 >     same set of levels (in the same order).  In particular,
 >     `as.numeric' applied to a factor is meaningless, and may happen by
 >     implicit coercion.


Let me argue that when a factor is printed, you don't see the numeric 
codes, you just see the labels. From an ergonomic point of view, in many 
situations where labels are used, the numeric representation of a 
unordered factor is just an irrelevant 'internal' coding. (E.g. when 
factors are parsed automatically by read.table).

[Named vectors and labels in factors are part of the reasons why I like 
R better than, say, Matlab: you don't have to remember tons of numeric 
codes.]

Given a named vector 'm' and a factor 'f' whose levels match (e.g. when 
'm' is the result of a 'tapply' command using the factor f as INDEX), my 
intuition is that m[f] means m[as.character(f)]

Others persons with a more precise knowledge of R probably find it 
natural that a factor is numeric in *essence* (despite its *appearance* 
when printed).

I am not proposing to change R to adapt it to my intuition.
I just believed that the trap was dangerous enough to (1) dare display 
my ignorance and (2) suggest that a warning in the 'Introduction to R' 
would not a bad idea (maybe it is but I have not read carefully enough...)

Cheers,

Christophe Pallier




More information about the R-help mailing list