[R] Surprise when indexing with a factor.

Gabor Grothendieck ggrothendieck at myway.com
Sat May 8 16:12:21 CEST 2004


Note that if f is a factor with Date labels, e.g.

   f <- factor(c("2000-02-02","2000-02-03"))

then as.Date has a factor method whose effect is such that (as of R 1.9.1):

   as.Date(f)

*is* the same as as.Date(as.character(f)) .  (Presumably this makes
it easier to use as.Date with read.table.)  




pallier <pallier <at> lscp.ehess.fr> writes:

: 
: >It may be educational to read ?factor before you use a factor for some
: >operation (such as subscripting), I guess.  In part, it says:
: >
: >Value:
: >
: >     'factor' returns an object of class '"factor"' which has a set of
: >     numeric codes the length of 'x' with a '"levels"' attribute of
: >     mode 'character'.  If 'ordered' is true (or 'ordered' is used) the
: >     result has class 'c("ordered", "factor")'.
: >
: >In other words, a factor is a numeric vector with a "levels" attribute.
: >What do you expect to happen when you use a numeric vector as subscript?
: >  
: >
: 
: Hello,
: 
: Ok, I understand the point: I should have read the documentation better...
: 
: The 'warning' section of the help on 'factor' is even more enlightning:
: 
:  >    The interpretation of a factor depends on both the codes and the
:  >    `"levels"' attribute.  Be careful only to compare factors with the
:  >     same set of levels (in the same order).  In particular,
:  >     `as.numeric' applied to a factor is meaningless, and may happen by
:  >     implicit coercion.
: 
: Let me argue that when a factor is printed, you don't see the numeric 
: codes, you just see the labels. From an ergonomic point of view, in many 
: situations where labels are used, the numeric representation of a 
: unordered factor is just an irrelevant 'internal' coding. (E.g. when 
: factors are parsed automatically by read.table).
: 
: [Named vectors and labels in factors are part of the reasons why I like 
: R better than, say, Matlab: you don't have to remember tons of numeric 
: codes.]
: 
: Given a named vector 'm' and a factor 'f' whose levels match (e.g. when 
: 'm' is the result of a 'tapply' command using the factor f as INDEX), my 
: intuition is that m[f] means m[as.character(f)]
: 
: Others persons with a more precise knowledge of R probably find it 
: natural that a factor is numeric in *essence* (despite its *appearance* 
: when printed).
: 
: I am not proposing to change R to adapt it to my intuition.
: I just believed that the trap was dangerous enough to (1) dare display 
: my ignorance and (2) suggest that a warning in the 'Introduction to R' 
: would not a bad idea (maybe it is but I have not read carefully enough...)
: 
: Cheers,
: 
: Christophe Pallier
: 
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://www.stat.math.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
:




More information about the R-help mailing list