[R] Some clarificatins of anova() and summary ()

Sun Dec 14 16:56:12 CET 2008

running anova() on intact12 and intact 21 gives two different results!!

> anova(intact12)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
x1         1 663.18  663.18 203.065 < 2.2e-16 ***
x2         1  35.21   35.21  10.781  0.001940 **
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(intact21)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq  F value Pr(>F)
x2         1 698.26  698.26 213.8077 <2e-16 ***
x1         1   0.12    0.12   0.0379 0.8466
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On Sun, Dec 14, 2008 at 8:56 PM, Tanmoy Talukdar
<tanmoy.talukdar at gmail.com> wrote:
> Why do you think that running lm() twice on those two models is going
> to help me?  They are identical models and hence we get identical
> results.The second question is now alright. I had some
> misunderstanding about it.
>
> Please tell me if you can find any "downside " in summary (). I can't find any.
>
>
> i 've edited the code for that replication  issue.
>
> set.seed(127)
> n <- 50
> x1 <- runif(n,1,10)
> x2 <- x1 + rnorm(n,0,0.5)
> plot(x1,x2) # x1 and x2 strongly correlated
> cor(x1,x2)
> y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
> intact.lm <- lm(y ~ x1 + x2)
> summary(intact.lm)
> anova(intact.lm)
>
>
>> summary(intact.lm)
>
> Call:
> lm(formula = y ~ x1 + x2)
>
> Residuals:
>   Min      1Q  Median      3Q     Max
> -3.4578 -1.1326  0.4551  1.2807  4.8241
>
> Coefficients:
>           Estimate Std. Error t value Pr(>|t|)
> (Intercept)  3.63603    0.61944   5.870 4.23e-07 ***
> x1          -0.09555    0.49114  -0.195  0.84658
> x2           1.59384    0.48542   3.283  0.00194 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 1.807 on 47 degrees of freedom
> Multiple R-squared: 0.8198,     Adjusted R-squared: 0.8121
> F-statistic: 106.9 on 2 and 47 DF,  p-value: < 2.2e-16
>
>> anova(intact.lm)
> Analysis of Variance Table
>
> Response: y
>         Df Sum Sq Mean Sq F value    Pr(>F)
> x1         1 663.18  663.18 203.065 < 2.2e-16 ***
> x2         1  35.21   35.21  10.781  0.001940 **
> Residuals 47 153.49    3.27
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> On Sun, Dec 14, 2008 at 8:26 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>> On Dec 14, 2008, at 9:40 AM, Tanmoy Talukdar wrote:
>>
>>> [sorry for the repost. I forgot to switch off formatting last time]
>>>
>>> I have two assignment problems...
>>>
>>> I have written this small code for regression with two regressors .
>>>
>> For replication purposes, it might be good to set a seed for the random
>> number generation.
>>
>> set.seed(127)
>>>
>>> n <- 50
>>> x1 <- runif(n,1,10)
>>> x2 <- x1 + rnorm(n,0,0.5)
>>> plot(x1,x2) # x1 and x2 strongly correlated
>>> cor(x1,x2)
>>> y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
>>> intact.lm <- lm(y ~ x1 + x2)
>>> summary(intact.lm)
>>> anova(intact.lm)
>>>
>> You should also run anova on these models:
>>
>> intact21 <- lm(y~x2+x1)
>> intact12 <- lm(y~x1+x2)
>>
>>>
>>> the questions are
>>>
>>> 1.The function summary() is convenient since the result does not
>>> depend on the order the variables
>>> are listed in the linear model definition. It has a serious downside
>>> though which is obvious in this case.
>>> Are there any signficant variables left?
>>>
>>> 2. An anova(intact.lm) table shows how much the second variable
>>> contributes to the result in
>>> addition to the first. Is there a variable significant now?Is the
>>> second variable significant?
>>
>> Both anova and summary were in agreement that the P-value for addition of x2
>> ito a
>> model that already 1ncluded x1 is 0.0296. One of them uses the t statistic
>> and the
>> other used the F statistic. I am not sure where your confusion lies.
>>
>> --
>> David Winsemius
>>
>>>
>>>
>>> the results i got:
>>>
>>>> summary(intact.lm)
>>>
>>> Call:
>>> lm(formula = y ~ x1 + x2)
>>>
>>> Residuals:
>>>   Min      1Q  Median      3Q     Max
>>> -5.5824 -1.5314 -0.1568  1.4425  5.3374
>>>
>>> Coefficients:
>>>           Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)   3.4857     0.9354   3.726 0.000521 ***
>>> x1            0.2537     0.6117   0.415 0.680191
>>> x2            1.3517     0.6025   2.244 0.029608 *
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>
>>> Residual standard error: 2.34 on 47 degrees of freedom
>>> Multiple R-squared: 0.7483,     Adjusted R-squared: 0.7376
>>> F-statistic: 69.87 on 2 and 47 DF,  p-value: 8.315e-15
>>>
>>>> anova(intact.lm)
>>>
>>> Analysis of Variance Table
>>>
>>> Response: y
>>>         Df Sum Sq Mean Sq  F value   Pr(>F)
>>> x1         1 737.86  737.86 134.7129 2.11e-15 ***
>>> x2         1  27.57   27.57   5.0338  0.02961 *
>>> Residuals 47 257.43    5.48
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>
>>>
>>>
>>> my question is that , i cant see any "serious downside" in using
>>> summary (). And in the second question I am totally clueless. I need
>>> your help
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>