[R] Understanding the intercept value in a multiple linear regression with categorical values

Joao Azevedo joao.c.azevedo at gmail.com
Fri Jul 27 13:32:31 CEST 2012


Hi!

I'm failing to understand the value of the intercept value in a
multiple linear regression with categorical values. Taking the
"warpbreaks" data set as an example, when I do:

> lm(breaks ~ wool, data=warpbreaks)

Call:
lm(formula = breaks ~ wool, data = warpbreaks)

Coefficients:
(Intercept)        woolB
     31.037       -5.778

I'm able to understand that the value of intercept is the mean value
of breaks when wool equals "A", and that adding up the "woolB"
coefficient to the intercept value I get the mean value of breaks when
wool equals "B". However, if I also consider the tension variable in
the model, I'm unable to figure out the meaning of the intercept
value:

> lm(breaks ~ wool + tension, data=warpbreaks)

Call:
lm(formula = breaks ~ wool + tension, data = warpbreaks)

Coefficients:
(Intercept)        woolB     tensionM     tensionH
     39.278       -5.778      -10.000      -14.722

I thought it would be the mean value of breaks when either wool equals
"A" or tension equals "L", but that isn't true for this dataset.

Any clues on interpreting the value of intercept?

Thanks!

--
Joao.



More information about the R-help mailing list