[R] factor with numeric names

John Fox jfox at mcmaster.ca
Sat Mar 21 23:35:46 CET 2009


Dear Saiwing Yeung,

You appear to be using orthogonal-polynomial contrasts (generated by
contr.poly) for Seed, which suggests that Seed is either an ordered factor
or that you've assigned these contrasts to it. Because Seed has 14 levels,
you end up fitting an degree-13 polynomial. If Seed is indeed an ordered
factor and you want to use contr.treatment instead then you could, e.g., set
Loblolly$Seed <- as.factor(Loblolly$Seed). (If I'm right about Seed being an
ordered factor, your solution worked because it changed Seed to a factor,
not because it used non-numeric level names.)

I hope this helps,
 John

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Saiwing Yeung
> Sent: March-21-09 5:02 PM
> To: r-help at r-project.org
> Subject: [R] factor with numeric names
> 
> Hi all,
> 
> I have a pretty basic question about categorical variables but I can't
> seem to be able to find answer so I am hoping someone here can help. I
> found that if the factor names are all in numbers, fitting the model
> in lm would return labels that are not very recognizable.
> 
> # Example: let's just assume that we want to fit this model
> fit <- lm(height ~ age + Seed, data=Loblolly)
> 
> # See the category names are all mangled up here
> fit
> 
> 
> Call:
> lm(formula = height ~ age + Seed, data = Loblolly)
> 
> Coefficients:
> (Intercept)          age       Seed.L       Seed.Q       Seed.C
> Seed^4
>     -1.31240      2.59052      4.86941      0.87307      0.37894
> -0.46853
>       Seed^5       Seed^6       Seed^7       Seed^8       Seed^9
> Seed^10
>      0.55237      0.39659     -0.06507      0.35074     -0.83442
> 0.42085
>      Seed^11      Seed^12      Seed^13
>      0.53906     -0.29803     -0.77254
> 
> 
> 
> One possible solution I found is to rename the categorical variables
> 
> seed.str <- paste("S", Loblolly$Seed, sep="")
> seed.str <- factor(seed.str)
> fit <- lm(height ~ age + seed.str, data=Loblolly)
> fit
> 
> 
> 
> Call:
> lm(formula = height ~ age + seed.str, data = Loblolly)
> 
> Coefficients:
>   (Intercept)           age  seed.strS303  seed.strS305  seed.strS307
>       -0.4301        2.5905        0.8600        1.8683       -1.9183
> seed.strS309  seed.strS311  seed.strS315  seed.strS319  seed.strS321
>        0.5350       -1.5933       -0.8867       -0.3650       -2.0350
> seed.strS323  seed.strS325  seed.strS327  seed.strS329  seed.strS331
>        0.3067       -1.3233       -2.6400       -2.9333       -2.2267
> 
> 
> Now it is actually possible to see which one is which, but is kind of
> lame. Can someone point me to a more elegant solution? Thank you so
> much.
> 
> Saiwing Yeung
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list