[R] Trouble about the interpretation of intercept in lm models

Tue Jan 13 18:34:20 CET 2009

on 01/13/2009 11:25 AM Peter Dalgaard wrote:
> Marc Schwartz wrote:
> 
>>> DF.fitted
>>           Y A B     F.lm
>> 1  21.86773 0 a 23.52957
>> 2  25.91822 0 a 23.52957
>> 3  20.82186 0 a 23.52957
>> 4  42.97640 1 a 36.18023
>> 5  36.64754 1 a 36.18023
>> 6  30.89766 1 a 36.18023
>> 7  47.43715 0 b 46.50615
>> 8  48.69162 0 b 46.50615
>> 9  47.87891 0 b 46.50615
>> 10 53.47306 1 b 59.15681
>> 11 62.55891 1 b 59.15681
>> 12 56.94922 1 b 59.15681
>> 13 61.89380 0 c 62.98442
>> 14 53.92650 0 c 62.98442
>> 15 70.62465 0 c 62.98442
>> 16 74.77533 1 c 75.63508
>> 17 74.91905 1 c 75.63508
>> 18 79.71918 1 c 75.63508
>>
>>
>> # Now get the means of the fitted values across
>> # the combinations of A and B
>> M <- with(DF.fitted, tapply(F.lm, list(A = A, B = B), mean))
>>
>>> M
>>    B
>> A          a        b        c
>>   0 23.52957 46.50615 62.98442
>>   1 36.18023 59.15681 75.63508
>>
>>
>> Thus:
>>
>> # Intercept = *fitted* mean at A = 0; B = "a"
>>> M["0", "a"]
>> [1] 23.52957
> 
> Actually, notice that you are averaging identical values, so the "mean"
> in the tapply is slightly misleading.
> 
> Notice also that the intercept may be defined even when _no_
> observations have zero entries in the design matrix. This is the usual
> case in linear regression, for instance, but it can happen in factorial
> designs (unbalanced, or using other than treatment contrasts) as well.

Good points on both accounts Peter.

Thanks,

Marc