[R] stupid lm() question

Rolf Turner rolf.turner at xtra.co.nz
Wed Sep 14 01:48:39 CEST 2011


``The only stupid question is the one that is not asked.''

         --- Anon. (???)

The reason that there is no value for level "A" of treatment is
that level "A" is the reference level, under the default ``treatment''
contrasts.

See ?contr.treatment.

Some insight may be obtained by doing:

     lm(decrease ~ 0 + ., data=OrchardSprays)

which gives:

Coefficients:
     rowpos      colpos  treatmentA  treatmentB  treatmentC  treatmentD
     -2.784      -1.234      22.705      25.705      43.330      53.080
treatmentE  treatmentF  treatmentG  treatmentH
     81.205      87.080      86.580     108.330

Note that the coefficient for "treatmentA" which now magically appears
is the same as the intercept coefficient in your "original" version.  (Note
also that the intercept coefficient magically disappears.)

The value for treatmentB in the "original" version is the *difference*
between the treatmentA coefficient and the treatmentB coefficient
in the "new" version.  And so on.

Mathematically, the intercepts for each (the i-th) treatment are mu + 
alpha_i,
but this gives ``too many parameters'' --- mu, alpha_1, ..., alpha_k, 
i.e. k+1
parameters where only k can be used.  So ``constraints'' are put on the
parameters.  The treatment contrasts constrain alpha_1 to 0.  Sticking in
the ``0 + '' constrains mu to be 0 instead.

HTH

     cheers,

         Rolf Turner

On 14/09/11 10:06, Carl Witthoft wrote:
> I feel bad even asking, but:
>
> Rgames> data(OrchardSprays)
> Rgames> model<-lm(decrease~.,data=OrchardSprays)
> Rgames> model
>
> Call:
> lm(formula = decrease ~ ., data = OrchardSprays)
>
> Coefficients:
> (Intercept)       rowpos       colpos   treatmentB   treatmentC
>      22.705       -2.784       -1.234        3.000       20.625
>  treatmentD   treatmentE   treatmentF   treatmentG   treatmentH
>      30.375       58.500       64.375       63.875       85.625
>
>
> Rgames> levels(OrchardSprays$treatment)  #just double-checking...
> [1] "A" "B" "C" "D" "E" "F" "G" "H"
>
> So: why isn't there a value for the level "A" of treatment? Is it 
> because the (alphabetically) first level is treated as the control?
> And if so, what should I do (calculate) with the coefficients to 
> compare them with the  statistics of the data subset with level "A" ?
>
>
> Please feel free to tell me to shut up and read some part of R-inferno 
> or other helpful document.
>
>
> Thanks.
> Carl
>
>



More information about the R-help mailing list