[R] unbalanced one-way ANOVA

Douglas Bates bates at stat.wisc.edu
Fri Feb 29 17:51:04 CET 2008


On Fri, Feb 29, 2008 at 10:32 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl> wrote:

> I tried a 6-way anova, and indeed found out that changing the order of
> factors influences the SS, F-ratio's and p-values. So what should I do if I
> want to know which factor most strongly rejects H0? (H0 is the hypothese of
> "no difference" in the population means) Should I better do 6 one-way
> anova's (on each factor) and then compare the p-values?

No.

If you are going to try to perform a 6-way anova on an unbalanced data
set you should read more about the analysis of
variance so that you can understand the model and the hypotheses
involved or ask a statistical consultant.  This is not a topic that
can be explained in a couple of email messages.

You may find Bill Venables paper "Exegeses on Linear Models" (do an
internet search on the title to find a copy) a good starting point.

>  ________________________________
>
> From: dmbates at gmail.com on behalf of Douglas Bates
> Sent: Fri 29-2-2008 15:38
> To: Nauta, A.L.
> Cc: R Help
>
>
> Subject: Re: [R] unbalanced one-way ANOVA
>
>
>
>
>
> On Fri, Feb 29, 2008 at 4:47 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl>
> wrote:
>
> > Thank you for your reply,
> > is your answer (that the approach does not depend on balance in the data)
> > only valid for one-way anova, or also for two-way or more-way anova?
>
> Any kind.
>
> You should be aware that for unbalanced data sets the sum of squares
> attributed to a term depends on the order in which the terms occur in
> the model.  That is, the sum of squares and the F-ratios and the
> p-values for, say, factor A will be different if you fit a model
>
> y ~ A + B
>
> versus the model
>
> y ~ B + A
>
> to a data set where factors A and B are unbalanced.
>
> This is because the sums of squares displayed by R's anova methods are
> the sequential sums of squares.  Although other statistical software
> may calculate other, more exotic, types of sums of squares, many of us
> would argue that these are the only ones that make sense.
>
> If in doubt about which sum of squares to use, the general rule is
> that you should only pay attention to the F ratio and p-value for the
> last term in the model.
>
> >  ________________________________
> >  From: dmbates at gmail.com on behalf of Douglas Bates
> > Sent: Fri 29-2-2008 0:39
> > To: Nauta, A.L.
> > Cc: r-help at r-project.org
> > Subject: Re: [R] unbalanced one-way ANOVA
> >
> >
> >
> >
> >
> > On Thu, Feb 28, 2008 at 7:52 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl>
> > wrote:
> > > Hi,
> >
> > >  I have an unbalanced dataset on which I would like to perform a one-way
> > anova test using R (aov). According to Wannacott and Wannacott (1990) p.
> > 333, one-way anova with unbalanced data is possible with a few
> modifications
> > in the anova-calculations. The modified anova calculations should take
> into
> > account different sample sizes and a modified definition of the average. I
> > was wondering if the aov-function in R is suitable for one-way anova on
> > unbalanced data.
> >
> > Yes.
> >
> > The analysis of variance is performed in R by fitting a linear model
> > created from indicator variables for the levels of the factor.  This
> > validity of this approach does not depend on balance in the data.
> >
> > The formulas given in an introductory textbook are almost never the
> > way that results are computed in practice.  I think we would all be
> > better off if they didn't even give these misleading formulas.
> >
>



More information about the R-help mailing list