[R] anova.lm F test confusion

Gerrit Eichner Gerrit.Eichner at math.uni-giessen.de
Wed Mar 21 08:19:23 CET 2012


Dear Ben, or anybody else, of course,

I'd be grateful if you could point me to a reference (different from ch. 4 
"Linear models" in "Statistical Models in S" (Chambers & Hastie (1992))) 
regarding the (asserted F-)distributional properties of the test statistic 
(used, e.g., by anova.lm()) to compare model 1 with model 2 using the MSE 
of model 3 in a sequence of three nested (linear) models? (A short 
RSiteSearch() and a google search didn't lead me far ...)

Thx in advance!

  Best regards  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner
---------------------------------------------------------------------

On Wed, 21 Mar 2012, Ben Bolker wrote:

> msteane <michellesteane <at> hotmail.com> writes:
>
>>
>> I am using anova.lm to compare 3 linear models.  Model 1 has 1 variable,
>> model 2 has 2 variables and model 3 has 3 variables.  All models are fitted
>> to the same data set.
>
>  (I assume these are nested models, otherwise the analysis doesn't
> make sense ...)
>
>>
>> anova.lm(model1,model2) gives me:
>>
>>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
>> 1    135 245.38
>> 2    134 184.36  1    61.022 44.354 6.467e-10 ***
>>
>> anova.lm(model1,model2,model3) gives me:
>>
>>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
>> 1    135 245.38
>> 2    134 184.36  1    61.022 50.182 7.355e-11 ***
>> 3    133 161.73  1    22.628 18.609 3.105e-05 ***
>>
>> Why aren't the 2nd row F values from each of the anova tables the same??? I
>> thought in each case the 2nd row is comparing model 2 to model 1?
>
> From ?anova.lm:
>
> Normally the F statistic is most appropriate, which compares the mean
> square for a row to the residual sum of squares for the largest model
> considered.
>
>>
>> I figured out that for anova.lm(model1,model2)
>> F(row2)=Sum of Sq(row2)/MSE of Model 2
>>
>> and for anova.lm(model1,model2,model3)
>>  F(row2)=Sum of Sq(row 2)/MSE of Model 3  <-- I don't get why the MSE of
>> model 3 is being included if we're comparing Model 2 to Model 2
>
>   See above ...
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list