[R] Help with analysis of variance

Richard M. Heiberger rmh at temple.edu
Fri Aug 22 17:35:25 CEST 2014


Andrew,

I plotted your data first.  Then I made id, spell, year into factors
and did the ANOVA.
>From the graph, it looks like the two ids who had more than one spell
show variability in rate.
The ANOVA table agrees by showing high significance for the id:spell
interaction.
The numbers in this ANOVA table are not similar to your numbers.

Rich

mobility$id <- factor(mobility$id)
mobility$spell <- factor(mobility$spell)
mobility$year <- factor(mobility$year)

xyplot(rate ~ year | id, group=spell, data=mobility,
       pch=19, layout=c(1,5), scales=list(alternating=FALSE),
       between=list(y=1),
       strip=FALSE, strip.left=strip.custom(strip.names=c(TRUE,TRUE)))

mobility.aov <- aov(rate ~ year + id/spell, data=mobility)
anova(mobility.aov)


On Fri, Aug 22, 2014 at 7:51 AM, McCulloch, Andrew
<A.Mcculloch at leedsmet.ac.uk> wrote:
> Hi,
>
>
>
> I have an observational dataset which consists of eight annual observations (year) for children (id) recording the rate of unemployment in the neighbourhood in which they lived (rate). I know if children move home so the data also has an identifier for spells in the same neighbourhood (spell). I want to decompose the overall variation in children's experience of area unemployment, given by the sum of (rate - mean rate)^2, into a) the component within a residential spell, sum of (rate - spell mean of rate)^2, b) the component between spells, sum of (spell mean), and c) the component between children, sum of (rate - mean rate for child). I think I can do this longhand using the calculations below:
>
>
>
> mobility <- structure(list(year = c(2002L, 2003L, 2004L, 2005L, 2006L, 2007L,2008L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L,
>
> 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L,
>
> 2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L, 2005L, 2006L,
>
> 2007L, 2008L), rate = c(13.08962, 14.27165, 4.496403, 3.89839,
>
> 4.60199, 5.138746, 5.251025, 4.874652, 5.880996, 5.813953, 6.204044,
>
> 6.93802, 6.866853, 7.614808, 4.405841, 4.826733, 4.760742, 3.762136,
>
> 4.60199, 5.138746, 5.251025, 4.405841, 4.826733, 4.760742, 3.762136,
>
> 4.60199, 5.138746, 5.251025, 4.405841, 5.789474, 5.889423, 4.61211,
>
> 4.642526, 6.838906, 9.683488), spell = c(1L, 2L, 2L, 3L, 3L,
>
> 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), id = c(1L,
>
> 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
>
> 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
>
> 5L, 5L)), .Names = c("year", "rate", "spell", "id"), row.names = c(NA,
>
> -35L), class = "data.frame")
>
>
>
> mobility$id <- factor(mobility$id)
>
> mobility$spell <- factor(mobility$spell)
>
>
>
> mobility$spellmean <- ave(mobility$rate,mobility$id,mobility$spell,FUN=mean)
>
> mobility$personmean <- ave(mobility$rate,mobility$id,FUN=mean)
>
> mobility$totalmean <- mean(mobility$rate,na.rm=TRUE)
>
> N <- dim(mobility)[1]
>
> # observation deviation from overall mean
>
> sum(((mobility$rate-mobility$totalmean)^2)/N)
>
> 5.159846
>
> # observation deviation from spell mean
>
> sum(((mobility$rate-mobility$spellmean)^2)/N)
>
> 2.039461
>
> # deviation of spell mean from person mean
>
> sum(((mobility$spellmean-mobility$personmean)^2)/N)
>
> 2.13787
>
> # deviation of person mean from overall mean
>
> sum(((mobility$personmean-mobility$totalmean)^2)/N)
>
> 0.982515
>
>
>
> I think this is correct because the sum of the three components of variation sums to the total:
>
> 2.039461+2.13787+0.982515 = 5.159846
>
>
>
> Can someone show me how to use the analysis of variance functions in R to get the same result. Thanks.
>
> Andrew McCulloch
> Leeds Metropolitan University
>
>
>
>
>
>
> >From 22 September 2014 Leeds Metropolitan University will become Leeds Beckett University.
> Find out more at http://www.leedsbeckett.ac.uk
> To view the terms under which this email is distributed, please go to:-
> http://www.leedsmet.ac.uk/email-disclaimer.htm
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list