Gerrit Eichner
Gerrit.Eichner at math.uni-giessen.de
Wed Mar 21 08:19:23 CET 2012
Dear Ben, or anybody else, of course,
I'd be grateful if you could point me to a reference (different from ch. 4
"Linear models" in "Statistical Models in S" (Chambers & Hastie (1992)))
regarding the (asserted F-)distributional properties of the test statistic
(used, e.g., by anova.lm()) to compare model 1 with model 2 using the MSE
of model 3 in a sequence of three nested (linear) models? (A short
RSiteSearch() and a google search didn't lead me far ...)
Thx in advance!
Best regards -- Gerrit
On Wed, 21 Mar 2012, Ben Bolker wrote:
> msteane <michellesteane <at> hotmail.com> writes:
>
>>
>> I am using anova.lm to compare 3 linear models. Model 1 has 1 variable,
>> model 2 has 2 variables and model 3 has 3 variables. All models are fitted
>> to the same data set.
>
> (I assume these are nested models, otherwise the analysis doesn't
> make sense ...)
>
>>
>> anova.lm(model1,model2) gives me:
>>
>> Res.Df RSS Df Sum of Sq F Pr(>F)
>> 1 135 245.38
>> 2 134 184.36 1 61.022 44.354 6.467e-10 ***
>>
>> anova.lm(model1,model2,model3) gives me:
>>
>> Res.Df RSS Df Sum of Sq F Pr(>F)
>> 1 135 245.38
>> 2 134 184.36 1 61.022 50.182 7.355e-11 ***
>> 3 133 161.73 1 22.628 18.609 3.105e-05 ***
>>
>> Why aren't the 2nd row F values from each of the anova tables the same??? I
>> thought in each case the 2nd row is comparing model 2 to model 1?
>
> From ?anova.lm:
>
> Normally the F statistic is most appropriate, which compares the mean
> square for a row to the residual sum of squares for the largest model
> considered.
>
>>
>> I figured out that for anova.lm(model1,model2)
>> F(row2)=Sum of Sq(row2)/MSE of Model 2
>>
>> and for anova.lm(model1,model2,model3)
>> F(row2)=Sum of Sq(row 2)/MSE of Model 3 <-- I don't get why the MSE of
>> model 3 is being included if we're comparing Model 2 to Model 2
>
> See above ...
>
