[R] Sums of sq in car package Anova function

John Fox jfox at mcmaster.ca
Sun Dec 19 18:25:32 CET 2004


Dear Karla,

If indeed one of your factors has levels "0" and "1", that wouldn't matter
at all, but if it is a numeric variable with values 0 and 1 (rather than a
factor) then that would make a difference to the linear model that's fit to
the data. The difference doesn't affect the sequential ("type-I") sums of
squares produced by anova() but it does affect some of the type-III sums of
squares produced by Anova().

Anyway, I'm glad that you found the error.

Regards,
 John

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Karla Sartor
> Sent: Sunday, December 19, 2004 11:32 AM
> To: John Fox
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Sums of sq in car package Anova function
> 
> John,
> 
> Thank very much for your help.  I think that I have figured 
> out my problem.  The levels of one of my factors are "1" and 
> "0".  While this didn't matter with the 'anova()' function, 
> is does seem to alter the results with the 'Anova' function.  
> When I changed the levels to letters, the tables matched my 
> SPSS output.  As for why the type III test in SPSS was nearly 
> identical to the 'anova' function, my unequal sample sizes 
> were not drastically different so changing to type III must 
> not have changed the results very much?  That was all I could 
> come up with at the time.
> 
> Here is the code I used:
> 
> options(contrasts = c("contr.sum", "contr.poly"))
> require(car)
> 
> GH = read.table("GH.txt", header =T)
> GH.sub = subset(GH, GH$sp=="C")
> attach(GH.sub)
> 
> biomass= log10(GH.sub$tot.bio)
> GH.sub.fit = lm(biomass~am*nbr*barr, data=GH.sub) 
> print(Anova(GH.sub.fit, type='III'))
> 
> I get this with "1" and "0" factor levels:
> 
> Anova Table (Type III tests)
> 
> Response: biomass
>                  Sum Sq  Df   F value        Pr(>F)   
> (Intercept) 51.943   1     3725.4324 < 2.2e-16 ***
> am             2.403    1     172.3630   < 2.2e-16 ***
> nbr            0.779    3      18.6347     4.434e-10 ***
> barr           0.078    1      5.5803       0.01968 * 
> am:nbr       0.018    3      0.4284       0.73296   
> am:barr      0.039    1      2.7826       0.09775 . 
> nbr:barr      0.044    3     1.0606        0.36834   
> am:nbr:barr 0.022    3     0.5208        0.66873   
> Residuals    1.771 127   
> 
> 
> And this with letter factor levels:
> 
> Anova Table (Type III tests)
> 
> Response: biomass
>                     Sum Sq  Df   F value          Pr(>F)   
> (Intercept)    75.371   1     5405.7202     < 2e-16 ***
> am                2.403     1     172.3630     < 2e-16 ***
> nbr               1.482     3      35.4357     < 2e-16 ***
> barr              0.040     1      2.8410       0.09434 . 
> am:nbr          0.018     3      0.4284       0.73296   
> am:barr         0.039     1      2.7826       0.09775 . 
> nbr:barr         0.051     3      1.2167      0.30643   
> am:nbr:barr    0.022     3     0.5208       0.66873   
> Residuals       1.771 127                     
> ---
> 
> SPSS gives: 
> 
> Tests of Between-Subjects Effects
> Dependent Variable: lot10.tot.bio
> Source                    Type III                 df    Mean 
> Square    
> F             Sig.
>                                 Sum of Squares   
> Corrected Model    4.002(a)                 15       .267           
> 19.133      .000
> Intercept                  75.371                     1    
> 75.371       
>    5405.720  .000
> am                           2.403                       1    
> 2.403      
>       172.363    .000
> nbr                           1.482                       3    .494   
>           35.436       .000
> barr                           .040                        1  
>   .040    
>         .841            .094
> am * nbr                    018                         3    
> .006        
>     .428            .733
> am * barr                   .039                        1    
> .039        
>      2.783         .098
> nbr * barr                  .051                        3    
> .017        
>      1.217         .306
> am * nbr * barr         .022                        3     
> .007         
>     .521           .669
> Error                        1.771                      127  
> .014        
>    
> Total                        80.796                    143    
>            
> Corrected Total        5.772                      142               
> a    R Squared = .693 (Adjusted R Squared = .657)
> 
> 
> Am I missing something else?  I don't know the best way to 
> post the data set, so I will send it to John and maybe he can 
> post it if it is of interest.
> 
> Thanks again!
> 
> Karla
> 
> Karla Sartor
> Montana State University - LRES
> ksartor at montana.edu
> 
> 
> 
>  
> 
> 
> John Fox wrote:
> 
> >Dear Karla,
> >
> >I suggested last night that you send me further information, but 
> >decided this morning to try out a reproducible example of my own:
> >
> >  
> >
> >>set.seed(12345)
> >>A <- factor(sample(c("a1", "a2", "a3"), 100, replace=TRUE)) B <- 
> >>factor(sample(c("b1", "b2"), 100, replace=TRUE)) C <- 
> >>factor(sample(c("c1", "c2", "c3"), 100, replace=TRUE)) mu <- 
> >>array(1:18, c(3,2,3)) a <- as.numeric(A) b <- as.numeric(B) c <- 
> >>as.numeric(C) y <- mu[cbind(a,b,c)] + rnorm(100) mod <- 
> lm(y ~ A*B*C)
> >>library(car)
> >>options(contrasts=c("contr.sum", "contr.poly")) Anova(mod, 
> type="II")
> >>    
> >>
> >Anova Table (Type II tests)
> >
> >Response: y
> >           Sum Sq Df   F value    Pr(>F)    
> >A           65.88  2   38.4098 1.696e-12 ***
> >B          196.47  1  229.0775 < 2.2e-16 ***
> >C         2441.00  2 1423.0809 < 2.2e-16 ***
> >A:B          0.22  2    0.1259    0.8819    
> >A:C          6.92  4    2.0174    0.0996 .  
> >B:C          0.87  2    0.5095    0.6027    
> >A:B:C        2.89  4    0.8432    0.5018    
> >Residuals   70.33 82                        
> >---
> >Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >  
> >
> >>Anova(mod, type="III")
> >>    
> >>
> >Anova Table (Type III tests)
> >
> >Response: y
> >            Sum Sq Df   F value    Pr(>F)    
> >(Intercept) 7830.2  1 9129.8959 < 2.2e-16 ***
> >A             55.7  2   32.4913 4.059e-11 ***
> >B            189.5  1  221.0076 < 2.2e-16 ***
> >C           2124.0  2 1238.2549 < 2.2e-16 ***
> >A:B            0.2  2    0.0942    0.9102    
> >A:C            5.9  4    1.7323    0.1507    
> >B:C            0.6  2    0.3417    0.7115    
> >A:B:C          2.9  4    0.8432    0.5018    
> >Residuals     70.3 82                        
> >---
> >Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >
> >
> >I don't have a working copy of SPSS anymore, but here's what 
> SAS does 
> >with this example:
> >
> >      Source                      DF      Type II SS     
> Mean Square    F
> >Value    Pr > F
> >
> >      A                            2       65.884048       32.942024
> >38.41    <.0001
> >      B                            1      196.467384      196.467384
> >229.08    <.0001
> >      A*B                          2        0.215883        0.107942
> >0.13    0.8819
> >      C                            2     2440.998718     1220.499359
> >1423.08    <.0001
> >      A*C                          4        6.920872        1.730218
> >2.02    0.0996
> >      B*C                          2        0.873945        0.436973
> >0.51    0.6027
> >      A*B*C                        4        2.892820        0.723205
> >0.84    0.5018
> >
> >
> >      Source                      DF     Type III SS     
> Mean Square    F
> >Value    Pr > F
> >
> >      A                            2       55.732128       27.866064
> >32.49    <.0001
> >      B                            1      189.546201      189.546201
> >221.01    <.0001
> >      A*B                          2        0.161608        0.080804
> >0.09    0.9102
> >      C                            2     2123.968177     1061.984089
> >1238.25    <.0001
> >      A*C                          4        5.942845        1.485711
> >1.73    0.1507
> >      B*C                          2        0.586168        0.293084
> >0.34    0.7115
> >      A*B*C                        4        2.892820        0.723205
> >0.84    0.5018
> >
> >So, as you can see, the results check.
> >
> >It's hard to know what to make of this without more 
> information about 
> >what you did. Much as I'm not an admirer of SPSS, I doubt whether it 
> >computes type-III sums of squares incorrectly, so I suspect 
> something 
> >wrong with either your SPSS commands or your R commands.
> >
> >I hope this helps,
> > John
> >
> >--------------------------------
> >John Fox
> >Department of Sociology
> >McMaster University
> >Hamilton, Ontario
> >Canada L8S 4M4
> >905-525-9140x23604
> >http://socserv.mcmaster.ca/jfox
> >--------------------------------
> >
> >  
> >
> >>-----Original Message-----
> >>From: r-help-bounces at stat.math.ethz.ch 
> >>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Karla Sartor
> >>Sent: Saturday, December 18, 2004 6:43 PM
> >>To: r-help at stat.math.ethz.ch
> >>Subject: [R] Sums of sq in car package Anova function
> >>
> >>Hello R users,
> >>
> >>I am trying to run a three factor ANOVA on a data set with unequal 
> >>sample sizes.
> >>
> >>I fit the data to a 'lm' object and used the Anova function 
> from the 
> >>'car' package with the 'type=III' option to get type III sums of 
> >>squares.  I also set the contrast coding option to 
> 'options(contrasts 
> >>= c("contr.sum", "contr.poly"))' as cautioned in Jon Fox's 
> book "An R 
> >>and S-plus Companion to Applied Regression'.
> >>
> >>Is there anything else that I need to consider when using 
> the type III 
> >>option with the Anova function?
> >>
> >>When I run the same data set in SPSS with General Linear Model and 
> >>type III  sums of squares, the sums of squares are different enough 
> >>that one of the main effect terms is significant in the R table and 
> >>not in the SPSS table.  I found a similar discrepancy with 
> a different 
> >>data set, only SPSS showed a significant interaction effect while, 
> >>while the 'Anova' function did not.
> >>
> >>I also compared the results from SPSS those from the 'anova' 
> >>function in the base package, and the results are nearly 
> identical.  I 
> >>would expect the two methods with type III sums of squares 
> to be more 
> >>similar, does anyone have any ideas as to why that was not 
> the case?  
> >>I am hoping to not go back to SPSS at this point, so am trying to 
> >>decide which of the two R functions is most appropriate for me (and 
> >>defensible, considering the unequal sample sizes).
> >>
> >>Thank you in advance for any ideas you may have!
> >>
> >>Karla
> >>
> >>Karla Sartor
> >>Montana State University - LRES
> >>ksartor at montana.edu
> >>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide! 
> >>http://www.R-project.org/posting-guide.html
> >>    
> >>
> >
> >
> >  
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html




More information about the R-help mailing list