[R] couting events by subject with "black out" windows

Dennis Murphy djmuser at gmail.com
Sat Nov 19 05:25:59 CET 2011


Hi:

Here's a Q & D solution that could be improved. It uses the plyr
package. Starting from your data1 data frame,

library('plyr')
dseq <- seq(as.Date('2010-01-01'), as.Date('2011-06-05'), by = '30 days')
# Use the cut() function to create a factor whose levels are demarcated
# by the dates in dseq:
# See ?cut for labeling options
data1[['tf']] <- cut(data1$Date, dseq)
ddply(subset(data1, event == 1L), .(tf),  summarise, Date.min = min(Date))

          tf   Date.min
1 2010-01-01 2010-01-01
2 2010-01-31 2010-02-12
3 2010-05-01 2010-05-03
4 2011-03-27 2011-04-21

The value of tf is the left endpoint of the time interval.

This isn't your desired output in two respects: (1) summarise won't
carry along extra variables, so ID gets dropped; (2) you have
2010-03-01 as the first date of a 30-day period, but according to the
way I defined the 30-day intervals, Mar. 1 is the last day of an
interval, so that's why it's not included [2010-2-12 precedes it]. You
can always change the definitions. If you group by months instead, you
get the output you expected.

Hope this is enough to get you started..
Dennis



On Fri, Nov 18, 2011 at 3:22 PM, Chris Conner <connerpharmd at yahoo.com> wrote:
> I large datset that includes subjects(ID), Dates and events that need to be counted.  Not every date includes an event, and I need to only count one event per 30days, per subject.  So in essence, I need to create a 30-day "black out" period during which time an event cannot be "counted" for each subject.  The reason is that a rule has been set up, whereby a subject can only be "counted" once per 30 day period (the 30 day window includes the day the event of interest is counted).
>
> The solution should count only the following events per subject(per the 30-day blackout rule):
>
> ID Date
> auto1 1/1/2010
> auto2 2/12/2010
> auto2 4/21/2011
> auto3 3/1/2010
> auto3 5/3/2010
>
> I have created a multistep process to do this, but it is extremely clumsy (detailed below).  I have to believe that one of you has a much more elegant solution.  Thank you all in advance for any help!!!!
>
> ##     example data
> data1 <- structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,3L, 4L, 4L, 4L, 4L, 4L), .Label = c("", "auto1", "auto2", "auto3"), class = "factor"), Date = structure(c(14610, 14610, 14627,14680, 14652, 14660, 14725, 15085, 15086, 14642, 14669, 14732,14747, 14749), class = "Date"), event = c(1L, 1L, 1L, 0L, 1L,1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L)), .Names = c("ID", "Date","event"), class = "data.frame", row.names = c(NA, 14L))
> ##     remove non events
> data2 <- data1[data1$event==1,]
> library(doBy)
> ##     create a table of first events
> step1 <- summaryBy(Date~ID, data = data2, FUN=min)
> step1$Date30 <- step1$Date.min+30
> step2 <- merge(data2, step1, by.x="ID", by.y="ID")
> ##     use an ifelse to essentially remove any events that shouldn't be counted
> step2$event <- ifelse(as.numeric(step2$Date) >= step2$Date.min & as.numeric(step2$Date) <= step2$Date30, 0, step2$event)
> ##     basically repeat steps above until I get an error (no more events)
> data3 <- step2[step2$event==1,]
> data3<- data3[,1:3]
> step3 <- summaryBy(Date~ID, data = data3, FUN=min)
> step3$Date30 <- step3$Date.min+30
> step4 <- merge(data3, step3, by.x="ID", by.y="ID")
> step4$event <- ifelse(as.numeric(step4$Date) >= step4$Date.min & as.numeric(step4$Date) <= step4$Date30, 0, step4$event)
> ##     then I rbind the "keepers"
> ##     in this case steps 1 and 3 above
> final <- rbind(step1,step3)
> ##     then reformat
> final <- final[,1:2]
> final$Date.min <- as.Date(final$Date.min,origin="1970-01-01")
> ##     again, extremely clumsy, but it works...  HELP! :)
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list