[R] Odp: When factor is better than other types, such as vector and frame?

Petr PIKAL petr.pikal at precheza.cz
Mon Aug 24 09:26:24 CEST 2009


Hi

r-help-bounces at r-project.org napsal dne 23.08.2009 05:00:11:

> Hi,
> 
> It is easy to understand the types vector and frame.
> 
> But I am wondering why the type factor is designed in R. What is the
> advantage of factor compare with other data types in R? Can somebody
> give an example in which case the type factor is much better than
> other data types?

Although your expressions do not correspond much with naming conventions 
in R, usage of factor is sometimes preferable to character values.

consider e.g.

set.seed(111)
df<-data.frame(1:5, fac=sample(letters[1:2], 5, replace=T))
plot(df[,1], pch=as.numeric(df[,2]))
df[,2]<-as.character(df[,2])
plot(df[,1], pch=as.numeric(df[,2]))

Warning message:
In plot.xy(xy, type, ...) : NAs introduced by coercion

Another advantage is simple and straightforward manipulation with levels.

levels(df[,2])<-c("yes", "no")
> df
  X1.5 fac
1    1  no
2    2  no
3    3 yes
4    4  no
5    5 yes

together with easy ordering option of levels and subsequent plotting order 
in boxplots and similar.

> factor(df$fac, levels=levels(df$fac))
[1] no  no  yes no  yes
Levels: yes no
> factor(df$fac, levels=levels(df$fac)[2:1])
[1] no  no  yes no  yes
Levels: no yes

You need to get used to some features which are sometimes surprising but 
has a reason like levels persisting in subset.

> str(df[df$fac=="no",])
'data.frame':   3 obs. of  2 variables:
 $ X1.5: int  1 2 4
 $ fac : Factor w/ 2 levels "yes","no": 2 2 2
>

Regards
Petr



> 
> Regards,
> Peng
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list