# [R] anova.lm and F-test

Mon Jul 9 18:36:13 CEST 2012

```Dear Peter,

Thank you very much for that excellent answer to a rather stupid question :)
I did not notice that the RSS actually increased for the model with more
parameters and so in this case the F-statistic is negative and therefore a
p-value from the F-distribution is meaningless. But I guess your answer also
clarifies that as long as the F-statistic is in the valid range (>=0),
anova() will calculate it and return a p-value (whether or not the models
are nested).

Best, Suresh

Peter Dalgaard-2 wrote
>
> On Jul 9, 2012, at 15:40 , Suresh Krishna wrote:
>
>>
>> Hello,
>>
>> Why does anova.lm sometimes return a p-value and at other times  not ? Is
>> it because it recognizes nested models from non-nested ones ?
>>
>>> x<-seq(1,100,1)
>>> y<-3*x+rnorm(100)
>>> anova(lm(y~x),lm(y~x+I(x^2)),test="F")
>> Analysis of Variance Table
>>
>> Model 1: y ~ x
>> Model 2: y ~ x + I(x^2)
>>  Res.Df    RSS Df Sum of Sq      F Pr(>F)
>> 1     98 90.449
>> 2     97 90.288  1   0.16117 0.1732 0.6782
>>
>>> anova(lm(y~x),lm(y~I(x^2)+I(x^3)),test="F")
>> Analysis of Variance Table
>>
>> Model 1: y ~ x
>> Model 2: y ~ I(x^2) + I(x^3)
>>  Res.Df    RSS Df Sum of Sq F Pr(>F)
>> 1     98   90.4
>> 2     97 7345.7  1   -7255.3
>>
>
> You have Df and Sum of Sq with opposite sign, so more parameters with a
> worse fit. The models are not nested, so the F test makes no sense.
>
> I'd say that the real question is why anova.lm doesn't protest loudly when
> detecting this? One possible answer is that it also misses other
> non-nested cases where the signs do not clash, and warning only in some of
> the incorrect cases could lead the naive user to believe that the other
> ones are OK. E.g. this F test is equally meaningless
>
>> anova(lm(y~I(x^4)),lm(y~I(x^2)+I(x^3)),test="F")
> Analysis of Variance Table
>
> Model 1: y ~ I(x^4)
> Model 2: y ~ I(x^2) + I(x^3)
>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
> 1     98 186639
> 2     97   7101  1    179538 2452.4 < 2.2e-16 ***
>
> (Non-nestedness could in principle be determined by checking whether
> cbind(model.matrix(m1), model.matrix(m2)) has higher rank that both of its
> constituents, but numerical rank determination is a bit error-prone and
> slow, so this was not implemented).
>
>
> --
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes@  Priv: PDalgd@
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help