[R] Help with analysis of variance
McCulloch, Andrew
A.Mcculloch at leedsmet.ac.uk
Fri Aug 22 13:51:16 CEST 2014
Hi,
I have an observational dataset which consists of eight annual observations (year) for children (id) recording the rate of unemployment in the neighbourhood in which they lived (rate). I know if children move home so the data also has an identifier for spells in the same neighbourhood (spell). I want to decompose the overall variation in children's experience of area unemployment, given by the sum of (rate - mean rate)^2, into a) the component within a residential spell, sum of (rate - spell mean of rate)^2, b) the component between spells, sum of (spell mean), and c) the component between children, sum of (rate - mean rate for child). I think I can do this longhand using the calculations below:
mobility <- structure(list(year = c(2002L, 2003L, 2004L, 2005L, 2006L, 2007L,2008L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L,
2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L, 2005L, 2006L,
2007L, 2008L), rate = c(13.08962, 14.27165, 4.496403, 3.89839,
4.60199, 5.138746, 5.251025, 4.874652, 5.880996, 5.813953, 6.204044,
6.93802, 6.866853, 7.614808, 4.405841, 4.826733, 4.760742, 3.762136,
4.60199, 5.138746, 5.251025, 4.405841, 4.826733, 4.760742, 3.762136,
4.60199, 5.138746, 5.251025, 4.405841, 5.789474, 5.889423, 4.61211,
4.642526, 6.838906, 9.683488), spell = c(1L, 2L, 2L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), id = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
5L, 5L)), .Names = c("year", "rate", "spell", "id"), row.names = c(NA,
-35L), class = "data.frame")
mobility$id <- factor(mobility$id)
mobility$spell <- factor(mobility$spell)
mobility$spellmean <- ave(mobility$rate,mobility$id,mobility$spell,FUN=mean)
mobility$personmean <- ave(mobility$rate,mobility$id,FUN=mean)
mobility$totalmean <- mean(mobility$rate,na.rm=TRUE)
N <- dim(mobility)[1]
# observation deviation from overall mean
sum(((mobility$rate-mobility$totalmean)^2)/N)
5.159846
# observation deviation from spell mean
sum(((mobility$rate-mobility$spellmean)^2)/N)
2.039461
# deviation of spell mean from person mean
sum(((mobility$spellmean-mobility$personmean)^2)/N)
2.13787
# deviation of person mean from overall mean
sum(((mobility$personmean-mobility$totalmean)^2)/N)
0.982515
I think this is correct because the sum of the three components of variation sums to the total:
2.039461+2.13787+0.982515 = 5.159846
Can someone show me how to use the analysis of variance functions in R to get the same result. Thanks.
Andrew McCulloch
Leeds Metropolitan University
>From 22 September 2014 Leeds Metropolitan University will become Leeds Beckett University.
Find out more at http://www.leedsbeckett.ac.uk
To view the terms under which this email is distributed, please go to:-
http://www.leedsmet.ac.uk/email-disclaimer.htm
[[alternative HTML version deleted]]
More information about the R-help
mailing list