R-alpha: some thoughts about factor()

Jens Oehlschlaegel oehl@Psyres-Stuttgart.DE
Tue, 22 Jul 1997 10:56:56 +0200 (MET DST)



Dear R-people,

recently at s-news we had a discussion about factor().
I thought you might be interested in some of my thoughts about factors.

Any comments welcome


Best regards


Jens Oehlschlaegel-Akiyoshi


-------------------------------------------------------------------

I think the problem is deeper than that factors would just be handled
inapprobriately by some S+ functions, the inconsistency is built in
the constructor factor(), and thus is built in the concept! 

Let me cite S+ online help of factor() on the meaning of levels:

VALUE
object of class "factor", representing values taken from the finite set
given by levels. It is important that this object 
is not numeric; in particular, comparisons and other operations behave
AS IF THEY OPERATED ON VALUES FROM THE LEVELS SET, WHICH IS ALWAYS OF
MODE CHARACTER.

Let's try comparisions on values from the level set:

> my.animals <- c(4,5,6,4,5,6)
> my.levels <- 4:6
> my.labels <- c("dog","cat","rat")
> animals <- factor(my.animals,levels=my.levels,labels=my.labels)
> unclass(animals)
[1] 1 2 3 1 2 3
attr(, "levels"):
[1] "dog" "cat" "rat"

Obviously labels become levels, and comparisions with levels will never work,
whether levels are given as numerics or as characters, as in

> animals==4
[1] F F F F F F
> animals=="4"
[1] F F F F F F

Obviously currently all comparisions work on the LABELS SET, which is the
only one stored with the factor object, but which - for purpose of
confusion - is a named attribute "levels", whereas
 
> labels(animals) 
[1] "1" "2" "3" "4" "5" "6"

is a totally different story. 



All a user like me wishes is something like

> unclass(animals)
[1] 4 5 6 4 5 6
attr(, "labels"):
[1] "dog" "cat" "rat"

and of course it would be user-friendly if S+ would recognize which of
both representations are meant in comparisions and assignments like

animals==4
animals=="dog"
animals[1] <- 4
animals[1] <- "dog"

not coercing 4s to "4"s but interpreting animals the right way, isn't S+
an interpreter?


If there is no way around a need for internal numeric representation of
integers 1:x then a factor object could look like

> unclass(animals)
[1] 1 2 3 1 2 3
attr(, "levels"):
[1] "dog" "cat" "rat"
attr(, "nlevels"):
[1] 4 5 6


and consequently


[old function, sorry for the s-at-the-end-confusion] 
codes(animals)  could return 1 2 3 1 2 3

[old function] 
levels(animals) could return "dog" "cat" "rat"
[new function] 
level(animals) could return "dog" "cat" "rat" "dog" "cat" "rat"

[new function] 
nlevels(animals) could return 4 5 6
[new function]
nlevel(animals) could return 4 5 6 4 5 6


for reasons of consistency, so everyone would know he has to write either 

  nlevel(animals)==4
  nlevel(animals)[1] <- 4

or

  level(animals)=="dog"
  level(animals)[1] <- "dog"


Concerning defaults one probably would keep returning

animals
[1] "dog" "cat" "rat" "dog" "cat" "rat"

for reasons of compatibility

but make the evaluator coerce animals to nlevel(animals) in

animals[1] <- 4

instead of 

coercing 4 to "4"


I have no idea about how to change the constructor factor() to keep
compatibility, perhaps just keep arguments old.levels=new.nlevels and
old.labels=new.levels, together with a clear help function? 

DOES THAT SOUND ANY BETTER?



yours sincerely

--
Jens Oehlschlaegel-Akiyoshi
Psychologist/Statistician
Project TR-EAT + COST Action B6
                                                 F.rankfurt
oehl@psyres-stuttgart.de                         A.ttention
+49 711 6781-408 (phone)                         I.nventory
+49 711 6876902  (fax)                           R .-----.
                                                  / ----- \
Center for Psychotherapy Research                | | 0 0 | |
Christian-Belser-Strasse 79a                     | |  ?  | |
D-70597 Stuttgart Germany                         \ ----- /
-------------------------------------------------- '-----' -
                                                 it's better








=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-