[R] within group sequential subtraction

Joshua Wiley jwiley.psych at gmail.com
Thu Mar 10 19:19:01 CET 2011


Dear Natalie,

I am sure there are other ways, but one way you can do this is by
applying diff() to each group using tapply() or by().  Because those
return lists, if you want to add it back into your data frame, you can
wrap the whole call in unlist().  Here is an example:

dat <- structure(list(group = c("IND1", "IND1", "IND2",
"IND2", "IND2", "IND3", "IND4", "IND5",
"IND6", "IND6"), date_obs = structure(c(6468,
7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class =
"Date")), .Names = c("group",
"date_obs"), row.names = c(NA, 10L), class = "data.frame")

## calculate differences using diff() by each group
## note the prepended NA
dat$time <- unlist(tapply(dat$date_obs, dat$group,
  function(x) {diff(c(NA, x))}))

dat ## updated data frame

HTH,

Josh

On Thu, Mar 10, 2011 at 6:56 AM, natalie.vanzuydam <nvanzuydam at gmail.com> wrote:
> Hi Everyone,
>
> I would like to do sequential subtractions within a group so that I know the
> time between separate observations for a group of individuals.
>
> My data:
>
> data <- structure(list(group = c("IND1", "IND1", "IND2",
> "IND2", "IND2", "IND3", "IND4", "IND5",
> "IND6", "IND6"), date_obs = structure(c(6468,
> 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class =
> "Date")), .Names = c("group",
> "date_obs"), row.names = c(NA, 10L), class = "data.frame")
>
> So I start with:
>
>  group   date_obs
> 1   IND1 1987-09-17
> 2   IND1 1989-05-04
> 3   IND2 1997-04-30
> 4   IND2 2008-11-03
> 5   IND2 2009-05-08
> 6   IND3 1984-01-17
> 7   IND4 1996-09-28
> 8   IND5 2000-07-30
> 9   IND6 1998-01-17
> 10  IND6 1999-02-25
>
> what I would like:
>
>  group   date_obs     time
> 1   IND1 1987-09-17 NA
> 2   IND1 1989-05-04 595
> 3   IND2 1997-04-30 NA
> 4   IND2 2008-11-03 4205
> 5   IND2 2009-05-08 186
> 6   IND3 1984-01-17 NA
> 7   IND4 1996-09-28 NA
> 8   IND5 2000-07-30 NA
> 9   IND6 1998-01-17 NA
> 10  IND6 1999-02-25 404
>
> So that if there is one entry/individual a 0/NA would be acceptable and if
> there is more than one entry/individual the sequential difference would be
> calculated.
>
> I started with some code but it I cannot edit it appropriately.
>
> x <- do.call(rbind, lapply(split(data, data$group),
>        function(dat) {
>                        dat <- dat[order(dat$date_obs), ]
>                        d<-diff(dat$date_obs)
>                         dat <- rbind(dat,d)
>                        }))
>
> I get this error: "Error in as.Date.numeric(value) : 'origin' must be
> supplied" so I'm not sure if it does what I need it to do.  In addition to
> this the vector lengths won't match up as the first date in the sequence
> won't be subtracted from itself.
>
> I'm not sure if anyone knows an easier way to achieve this.
>
> Thanks for the help,
> Natalie
>
>
>
>
> -----
> Natalie Van Zuydam
>
> PhD Student
> University of Dundee
> nvanzuydam at dundee.ac.uk
> --
> View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list