[R] Help with analysis of variance

McCulloch, Andrew A.Mcculloch at leedsmet.ac.uk
Fri Aug 22 13:51:16 CEST 2014

```Hi,

I have an observational dataset which consists of eight annual observations (year) for children (id) recording the rate of unemployment in the neighbourhood in which they lived (rate). I know if children move home so the data also has an identifier for spells in the same neighbourhood (spell). I want to decompose the overall variation in children's experience of area unemployment, given by the sum of (rate - mean rate)^2, into a) the component within a residential spell, sum of (rate - spell mean of rate)^2, b) the component between spells, sum of (spell mean), and c) the component between children, sum of (rate - mean rate for child). I think I can do this longhand using the calculations below:

mobility <- structure(list(year = c(2002L, 2003L, 2004L, 2005L, 2006L, 2007L,2008L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L,

2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L,

2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L, 2005L, 2006L,

2007L, 2008L), rate = c(13.08962, 14.27165, 4.496403, 3.89839,

4.60199, 5.138746, 5.251025, 4.874652, 5.880996, 5.813953, 6.204044,

6.93802, 6.866853, 7.614808, 4.405841, 4.826733, 4.760742, 3.762136,

4.60199, 5.138746, 5.251025, 4.405841, 4.826733, 4.760742, 3.762136,

4.60199, 5.138746, 5.251025, 4.405841, 5.789474, 5.889423, 4.61211,

4.642526, 6.838906, 9.683488), spell = c(1L, 2L, 2L, 3L, 3L,

3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), id = c(1L,

1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,

3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,

5L, 5L)), .Names = c("year", "rate", "spell", "id"), row.names = c(NA,

-35L), class = "data.frame")

mobility\$id <- factor(mobility\$id)

mobility\$spell <- factor(mobility\$spell)

mobility\$spellmean <- ave(mobility\$rate,mobility\$id,mobility\$spell,FUN=mean)

mobility\$personmean <- ave(mobility\$rate,mobility\$id,FUN=mean)

mobility\$totalmean <- mean(mobility\$rate,na.rm=TRUE)

N <- dim(mobility)[1]

# observation deviation from overall mean

sum(((mobility\$rate-mobility\$totalmean)^2)/N)

5.159846

# observation deviation from spell mean

sum(((mobility\$rate-mobility\$spellmean)^2)/N)

2.039461

# deviation of spell mean from person mean

sum(((mobility\$spellmean-mobility\$personmean)^2)/N)

2.13787

# deviation of person mean from overall mean

sum(((mobility\$personmean-mobility\$totalmean)^2)/N)

0.982515

I think this is correct because the sum of the three components of variation sums to the total:

2.039461+2.13787+0.982515 = 5.159846

Can someone show me how to use the analysis of variance functions in R to get the same result. Thanks.

Andrew McCulloch
Leeds Metropolitan University

>From 22 September 2014 Leeds Metropolitan University will become Leeds Beckett University.
Find out more at http://www.leedsbeckett.ac.uk
To view the terms under which this email is distributed, please go to:-
http://www.leedsmet.ac.uk/email-disclaimer.htm

[[alternative HTML version deleted]]

```