[R] lagging over consecutive pairs of rows in dataframe

Bert Gunter bgunter.4567 at gmail.com
Fri Mar 17 18:19:47 CET 2017


Evan:

You misunderstand the concept of a lagged variable.

Ulrik:

Well, yes, that is certainly a general solution that works. However,
given the *specific* structure described by the OP, an even more
direct (maybe more efficient?) way to do it just uses (logical)
subscripting:

odds <-  (seq_len(nrow(mydata)) %% 2) == 1
newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2])
names(newdat) <- names(mydata)

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
> Hi Evan
>
> you can easily do this by applying diff() to each exp group.
>
> Either using dplyr:
> library(dplyr)
> mydata %>%
>   group_by(exp) %>%
>   summarise(difference = diff(rslt))
>
> Or with base R
> aggregate(mydata, by = list(group = mydata$exp), FUN = diff)
>
> HTH
> Ulrik
>
>
> On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote:
>
>> Suppose I have a dataframe that looks like the following:
>>
>> n=2
>> mydata <- data.frame(exp = rep(1:5,each=n), rslt =
>> c(12,15,7,8,24,28,33,15,22,11))
>> mydata
>>     exp rslt
>> 1    1   12
>> 2    1   15
>> 3    2    7
>> 4    2    8
>> 5    3   24
>> 6    3   28
>> 7    4   33
>> 8    4   15
>> 9    5   22
>> 10   5   11
>>
>> The variable 'exp' (for experiment') occurs in pairs over consecutive
>> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is
>> the 'control', and the second is a 'treatment'. The rslt column is the
>> result.
>>
>> What I'm trying to do is create a subset of this dataframe that consists
>> of the exp number, and the lagged difference between the 'control' and
>> 'treatment' result.  So, for exp=1, the difference is (15-12)=3. For
>> exp=2,  the difference is (8-7)=1, and so on. What I'm hoping to do is
>> take mydata (above), and turn it into
>>
>>       exp  diff
>> 1   1      3
>> 2   2      1
>> 3   3      4
>> 4   4      -18
>> 5   5      -11
>>
>> The basic 'trick' I can't figure out is how to create a lagged variable
>> between the second row (record) for a given level of exp, and the first
>> row for that exp.  This is easy to do in SAS (which I'm more familiar
>> with), but I'm struggling with the equivalent in R. The brute force
>> approach  I thought of is to simply split the dataframe into to (one
>> even rows, one odd rows), merge by exp, and then calculate a difference.
>> But this seems to require renaming the rslt column in the two new
>> dataframes so they are different in the merge (say, rslt_cont n the odd
>> dataframe, and rslt_trt in the even dataframe), allowing me to calculate
>> a difference between the two.
>>
>> While I suppose this would work, I'm wondering if I'm missing a more
>> elegant 'in place' approach that doesn't require me to split the data
>> frame and do every via a merge.
>>
>> Suggestions/pointers to the obvious welcome. I've tried playing with
>> lag, and some approaches using lag in the zoo package,  but haven't
>> found the magic trick. The problem (meaning, what I can't figure out)
>> seems to be conditioning the lag on the level of exp.
>>
>> Many thanks...
>>
>>
>> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y =
>> c(6,17,26,37,44))
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list