[R] two-way unbalanced ANOVA

peter dalgaard pdalgd at gmail.com
Sun Feb 27 10:05:22 CET 2011


On Feb 27, 2011, at 01:12 , Pasicolan wrote:

>  Hello Everyone,
> 
> *Question: *How do you calculate the sum of squares for a two-way 
> _unbalanced_ ANOVA?

There is no unique way of doing it. You can either do a decomposition as

f1 :  SS(Y~f1)
f2 :  SS(Y~f1+f2) - SS(Y~f1) 
res:  SStot - SS(Y~f1+f2)

where SS(....) denotes the model SS 

This is known as sequential SS, and is what you get from anova(lm(Y~f1+f2)). Notice that it depends on the order of f1 and f2. Alternatively, you can do single-term deletions,   

f1 :  SS(Y~f1+f2) - SS(Y~f2)
f2 :  SS(Y~f1+f2) - SS(Y~f1) 
res:  SStot - SS(Y~f1+f2)

but those SS's do not sum to anything interesting, i.e. they are not a decomposition. In R you generate them using drop1(lm(Y~f1+f2)).

(In some other software, these are known as "Type I" and "Type II" SS. There are also types III and IV, which are sometimes equal to Type II and sometimes does something strange, which you probably don't want to know about. You may want to look at the "Exegesis" document by Venables - Google gets you there soon enough)

Or, were you perhaps asking for how to get the "whole-model SS"? That is the sum of the two terms in the sequential anova(). You can generate it explicitly with

anova(lm(Y~1),lm(Y~f1+f2))

Notice that the resulting F-test is the one printed at the bottom of summary(lm(...))


> 
> *What I have done:*
> I have found many useful tutorials online for running a balanced two-way 
> ANOVA but I haven't had much luck for running a unbalanced two-way 
> ANOVA. From what I have read, the trouble with running an unbalanced 
> two-way ANOVA, is that things get tricky when calculating the sum of 
> squares for two factors and the interaction term.
> 
> *What I am stuck on:*
> So far what I have done is applied my data to this helpful tutorial: 
> http://www.biw.kuleuven.be/vakken/statisticsbyr/ANOVAbyRr/UnbalancedTwoWayANOVA.htm 
> 
> I found that my interaction term was insignificant, so my model was 
> reduced to a two-factor model. But then I got stuck on the part where I 
> use a general linear test (GLT) to determine whether I should use a two- 
> or one- factor model to get my correct sum of squares decomposition for 
> my unbalanced ANOVA. I found that there was significance between my two- 
> and one- factor models (p=2.124x10^-6). But now I do not know which 
> model to use for obtaining the correct sum of squares for my unbalanced 
> two-way ANOVA.
> 
> I am still a novice at writing R code (~6ish months of experience) so I 
> apologize if there is any error in how I delivered this message. Any 
> help regarding this topic would be most appreciated. Please try to make 
> any replies as simple as possible, most code in R is still new to me. 
> Thanks for reading!

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list