[R] quesion about SS of ANOVA

peter dalgaard pdalgd at gmail.com
Mon Feb 25 13:30:25 CET 2013


On Feb 25, 2013, at 12:35 , Bert Gunter wrote:

> This is a basic statistics question and off topic here. Talk to a
> statistician (i.e. someone with a good statistics background)  or
> start reading. You need an extensive statistics tutorial that I
> believe is too much for online forums like stats.stackexchange.com.
> 
> -- Cheers,
> Bert

True. On the other hand, once we are in R, try removing one observation:

> anova(lm(breaks~wool+tension, data=warpbreaks, subset=-1))
Analysis of Variance Table

Response: breaks
          Df Sum Sq Mean Sq F value    Pr(>F)    
wool       1  472.3  472.31  3.5293 0.0662491 .  
tension    2 2198.3 1099.16  8.2133 0.0008391 ***
Residuals 49 6557.5  133.83                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> anova(lm(breaks~tension+wool, data=warpbreaks, subset=-1))
Analysis of Variance Table

Response: breaks
          Df Sum Sq Mean Sq F value    Pr(>F)    
tension    2 2143.8 1071.92  8.0098 0.0009777 ***
wool       1  526.8  526.79  3.9364 0.0528683 .  
Residuals 49 6557.5  133.83                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ 

Now the results are order-dependant. The difference is that tension and wool are no longer orthogonal factors. For further enlightening, consult the literature as Bert suggests.

-pd


> 
> On Sun, Feb 24, 2013 at 8:07 PM, meng <laomeng_3 at 163.com> wrote:
>> Hi all:
>> I have a quesion about ANOVA: Is SS(Sum of Square) of a specific factor constant with the number of factors changing?
>> 
>> dat1 includes one factor g1,and g1's SS is called SS_g1_dat1.
>> dat2 includes two factors g1,g2,and g1's SS is called SS_g1_dat2.
>> 
>> My quesion is: Is SS_g1_dat1 equals to SS_g1_dat2?
>> 
>> I have both "yes" and "no" reasons for the quesion,but don't know which one is correct,which need your precious help.
>> 
>> The reasion for SS_g1_dat1 equals to SS_g1_dat2:
>> The formula for computing SS is:sum(sample size of level(i)*(mean of level(i)-TotalMean)^2),with i refers to each level in SS_g1_dat1 and SS_g1_dat2.
>> Every element of the formula is constant,so SS is constant.
>> 
>> Using the dataset "warpbreaks" from R:
>> anova(lm(breaks~wool))
>> Analysis of Variance Table
>> Response: breaks
>>          Df Sum Sq Mean Sq F value Pr(>F)
>> wool       1  450.7  450.67  2.6684 0.1084
>> Residuals 52 8782.1  168.89
>> 
>> anova(lm(breaks~wool+tension))
>> Analysis of Variance Table
>> Response: breaks
>>          Df Sum Sq Mean Sq F value   Pr(>F)
>> wool       1  450.7  450.67  3.3393 0.073614 .
>> tension    2 2034.3 1017.13  7.5367 0.001378 **
>> Residuals 50 6747.9  134.96
>> 
>> anova(lm(breaks~tension+wool))
>> Analysis of Variance Table
>> Response: breaks
>>          Df Sum Sq Mean Sq F value   Pr(>F)
>> tension    2 2034.3 1017.13  7.5367 0.001378 **
>> wool       1  450.7  450.67  3.3393 0.073614 .
>> Residuals 50 6747.9  134.96
>> 
>>> From above,wool's SS is always 450.7 not matter the number and order of factors.
>> 
>> 
>> The reasion for SS_g1_dat1 NOT equals to SS_g1_dat2:
>> The total SS is constant,so SS for each factor is decreasing with the number of factors increasing.
>> But when I use dataset "warpbreaks" to comfirm, it failed to confirm.The result shows that wool's SS is always 450.7 not matter the number and order of factors.
>> 
>> So which reason of the above two is correct then?
>> 
>> Many thanks for your help.
>> 
>> My best
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list