[R] lagging over consecutive pairs of rows in dataframe

Evan Cooch evan.cooch at gmail.com
Fri Mar 17 15:54:27 CET 2017


Suppose I have a dataframe that looks like the following:

n=2
mydata <- data.frame(exp = rep(1:5,each=n), rslt = 
c(12,15,7,8,24,28,33,15,22,11))
mydata
    exp rslt
1    1   12
2    1   15
3    2    7
4    2    8
5    3   24
6    3   28
7    4   33
8    4   15
9    5   22
10   5   11

The variable 'exp' (for experiment') occurs in pairs over consecutive 
rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is 
the 'control', and the second is a 'treatment'. The rslt column is the 
result.

What I'm trying to do is create a subset of this dataframe that consists 
of the exp number, and the lagged difference between the 'control' and 
'treatment' result.  So, for exp=1, the difference is (15-12)=3. For 
exp=2,  the difference is (8-7)=1, and so on. What I'm hoping to do is 
take mydata (above), and turn it into

      exp  diff
1   1      3
2   2      1
3   3      4
4   4      -18
5   5      -11

The basic 'trick' I can't figure out is how to create a lagged variable 
between the second row (record) for a given level of exp, and the first 
row for that exp.  This is easy to do in SAS (which I'm more familiar 
with), but I'm struggling with the equivalent in R. The brute force 
approach  I thought of is to simply split the dataframe into to (one 
even rows, one odd rows), merge by exp, and then calculate a difference. 
But this seems to require renaming the rslt column in the two new 
dataframes so they are different in the merge (say, rslt_cont n the odd 
dataframe, and rslt_trt in the even dataframe), allowing me to calculate 
a difference between the two.

While I suppose this would work, I'm wondering if I'm missing a more 
elegant 'in place' approach that doesn't require me to split the data 
frame and do every via a merge.

Suggestions/pointers to the obvious welcome. I've tried playing with 
lag, and some approaches using lag in the zoo package,  but haven't 
found the magic trick. The problem (meaning, what I can't figure out) 
seems to be conditioning the lag on the level of exp.

Many thanks...


mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y = c(6,17,26,37,44))



	[[alternative HTML version deleted]]



More information about the R-help mailing list