[R] anova.lm F test confusion

Ben Bolker bbolker at gmail.com
Wed Mar 21 03:19:20 CET 2012


msteane <michellesteane <at> hotmail.com> writes:

> 
> I am using anova.lm to compare 3 linear models.  Model 1 has 1 variable,
> model 2 has 2 variables and model 3 has 3 variables.  All models are fitted
> to the same data set.

  (I assume these are nested models, otherwise the analysis doesn't
make sense ...)

> 
> anova.lm(model1,model2) gives me:
> 
>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
> 1    135 245.38                                  
> 2    134 184.36  1    61.022 44.354 6.467e-10 ***
> 
> anova.lm(model1,model2,model3) gives me:
> 
>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
> 1    135 245.38                                  
> 2    134 184.36  1    61.022 50.182 7.355e-11 ***
> 3    133 161.73  1    22.628 18.609 3.105e-05 ***
> 
> Why aren't the 2nd row F values from each of the anova tables the same??? I
> thought in each case the 2nd row is comparing model 2 to model 1?  

 From ?anova.lm:

 Normally the F statistic is most appropriate, which compares the mean
 square for a row to the residual sum of squares for the largest model
 considered.

> 
> I figured out that for anova.lm(model1,model2) 
> F(row2)=Sum of Sq(row2)/MSE of Model 2 
> 
> and for anova.lm(model1,model2,model3)
>  F(row2)=Sum of Sq(row 2)/MSE of Model 3  <-- I don't get why the MSE of
> model 3 is being included if we're comparing Model 2 to Model 2

   See above ...



More information about the R-help mailing list