[R] logistic regression and dummy variable coding

Marc Schwartz marc_schwartz at comcast.net
Fri Jun 29 02:41:21 CEST 2007


On Thu, 2007-06-28 at 18:16 -0500, Bingshan Li wrote:
> Hello everyone,
> 
> I have a variable with several categories and I want to convert this  
> into dummy variables and do logistic regression on it. I used  
> model.matrix to create dummy variables but it always picked the  
> smallest one as the reference. For example,
> 
> model.matrix(~.,data=as.data.frame(letters[1:5]))
> 
> will code 'a' as '0 0 0 0'. But I want to code another category as  
> reference, say 'b'. How to do it in R using model.matrix? Is there  
> other way to do it if model.matrix  has no such functionality?
> 
> Thanks!

See ?relevel

Note that this (creating dummy variables) will be done automatically in
R's modeling functions, which default to treatment contrasts on factors.
model.matrix() is used internally by model functions such as glm().

For example using a single factor:

FL <- factor(letters[1:5])

> FL
[1] a b c d e
Levels: a b c d e

> contrasts(FL)
  b c d e
a 0 0 0 0
b 1 0 0 0
c 0 1 0 0
d 0 0 1 0
e 0 0 0 1



FL.b <- relevel(FL, "b")

> FL.b
[1] a b c d e
Levels: b a c d e

> contrasts(FL.b)
  a c d e
b 0 0 0 0
a 1 0 0 0
c 0 1 0 0
d 0 0 1 0
e 0 0 0 1



See ?contrasts and the Statistical Models section in "An Introduction to
R".

HTH,

Marc Schwartz



More information about the R-help mailing list