[R] Chronological data manipulation question

jim holtman jholtman at gmail.com
Tue Oct 16 13:23:16 CEST 2007


I don't know if this is any faster, but it has no loop.  There are
improvements that can be made if speed is too slow.  Try it on your
data:

> x <- data.frame(id=c("001","001","001","001","002","002","002","002","002"),
+           year=c(2000,2001,2002,2003,1996,1997,1998,1999,2000),
+           variable=c(0,0,1,0,0,0,1,0,0))
> # will assume that the year is contiguous; exercise to reader if not
> # partition by 'id', find where 'variable' is 1 and set next five year.
> x.new <- lapply(split(x, x$id), function(person){
+     change <- which(person$variable == 1)
+     mark <- unique(unlist(lapply(change, seq, length=5)))
+     # make sure less than length
+     mark <- mark[mark <= nrow(person)]
+     person$v2 <- 0  # initialize to zero
+     person$v2[mark] <- 1  # set to 1 on changes + 5 years
+     person  # return new data
+ })
> do.call('rbind', x.new)
       id year variable v2
001.1 001 2000        0  0
001.2 001 2001        0  0
001.3 001 2002        1  1
001.4 001 2003        0  1
002.5 002 1996        0  0
002.6 002 1997        0  0
002.7 002 1998        1  1
002.8 002 1999        0  1
002.9 002 2000        0  1


On 10/16/07, Julien Barnier <jbarnier at ens-lsh.fr> wrote:
> Hi all,
>
> I currently work on a survey which contains biographical data stored
> in a chronological way, ie something like :
>
> id      year     variable
> 001     2000     0
> 001     2001     0
> 001     2002     1
> 001     2003     0
> 002     1996     0
> 002     1997     0
> 002     1998     1
> 002     1999     0
> 002     2000     0
>
> where id is a person identifier, year the year of observation and
> variable the variable value at given year. In this case, the variable
> says if a particular event happened during the given year or not.
>
> What I want to do is generate a new variable which would say if the
> event happened at least one time during the five years preceding the
> current one. So if I call this new variable v2, I'd like to obtain :
>
> id      year     variable      v2
> 001     2000     0             0
> 001     2001     0             0
> 001     2002     1             1
> 001     2003     0             1
> 002     1996     0             0
> 002     1997     0             0
> 002     1998     1             1
> 002     1999     0             1
> 002     2000     0             1
>
> Currently I manage to achieve this with two nested for loops, but it
> is *very* slow and inefficient. So I wondered if there is a better way
> to do this.
>
> Thanks in advance for any help.
>
> PS : here is the code to reproduce the first sample data :
>
> data.frame(id=c("001","001","001","001","002","002","002","002","002"),
>           year=c(2000,2001,2002,2003,1996,1997,1998,1999,2000),
>           variable=c(0,0,1,0,0,0,1,0,0))
>
> --
> Julien Barnier
> Groupe de recherche sur la socialisation
> ENS-LSH - Lyon, France
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list