[R] difference

P Tennant philipt900 at iinet.net.au
Sun Oct 30 02:57:06 CET 2016


Hi,

As Jeff said, more than one grouping variable can be supplied, and there 
is an example at the bottom of the help page for ave(). The same goes 
for by(), but the order that you supply the grouping variables becomes 
important. Whichever grouping variable is supplied first to by() will 
change its levels first in the output sequence. You can see from your 
dataset:

d2 <- data.frame(city=rep(1:2, ea=6),
     year=c(rep(2001, 3), rep(2002, 3), rep(2001, 3), rep(2002, 3)),
     num=c(25,75,150,35,65,120,25,95,150,35,110,120))

d2
    # city year num
# 1     1 2001  25
# 2     1 2001  75
# 3     1 2001 150
# 4     1 2002  35
# 5     1 2002  65
# 6     1 2002 120
# 7     2 2001  25
# 8     2 2001  95
# 9     2 2001 150
# 10    2 2002  35
# 11    2 2002 110
# 12    2 2002 120

that `year' changes its levels through the sequence down the table 
first, and then `city' changes. You want your new column to align with 
this sequence. If you put city first in the list of grouping variables 
for by(), rather than `year', you won't get the sequence reflected in 
your dataset:

by(d2$num, d2[c('city', 'year')], function(x) x - x[1])

# city: 1
# year: 2001
# [1]   0  50 125
# -----------------------------
# city: 2
# year: 2001
# [1]   0  70 125
# -----------------------------
# city: 1
# year: 2002
# [1]  0 30 85
# -----------------------------
# city: 2
# year: 2002
# [1]  0 75 85

In contrast to using by() as I've suggested, using match() to create 
indices that flag when a new `city/year' category is encountered seems a 
more explicit, secure way to do the calculation. Adapting an earlier 
solution provided in this thread:

year.city <- with(d2, interaction(year, city))
indexOfFirstYearCity <- match(year.city, year.city)
indexOfFirstYearCity
# [1]  1  1  1  4  4  4  7  7  7 10 10 10

d2$diff <- d2$num - d2$num[indexOfFirstYearCity]
d2

   city year num diff
1     1 2001  25    0
2     1 2001  75   50
3     1 2001 150  125
4     1 2002  35    0
5     1 2002  65   30
6     1 2002 120   85
7     2 2001  25    0
8     2 2001  95   70
9     2 2001 150  125
10    2 2002  35    0
11    2 2002 110   75
12    2 2002 120   85


Philip

On 29/10/2016 3:15 PM, Jeff Newmiller wrote:
> Now would be an excellent time to read the help page for ?ave. You can specify multiple grouping variables.



More information about the R-help mailing list