[R] Count number of change in a specified time interval

Bert Gunter gunter.berton at gene.com
Mon Aug 4 17:39:12 CEST 2014


Or ?rle

Bert



Sent from my iPhone -- please excuse typos.

> On Aug 4, 2014, at 8:28 AM, jim holtman <jholtman at gmail.com> wrote:
> 
> Try this, but I only get 2 changes for CB27A instead of you indicated 3:
> 
>> require(data.table)
>> x <- read.table(text = "CASE_ID YEAR_MTH ATT_1
> + CB26A    201302         1
> + CB26A    201302         0
> + CB26A    201302         0
> + CB26A    201303         1
> + CB26A    201303         1
> + CB26A    201304         0
> + CB26A    201305         1
> + CB26A    201305         0
> + CB26A    201306         1
> + CB27A    201304         0
> + CB27A    201304         0
> + CB27A    201305         1
> + CB27A    201306         1
> + CB27A    201306         0
> + CB27A    201307         0
> + CB27A    201308         1", header = TRUE, as.is = TRUE)
>> setDT(x)
>> # convert to a Date object for comparison
>> x[, MYD := as.Date(paste0(YEAR_MTH, '01'), format = "%Y%m%d")]
>> # separate by CASE_ID and only keep the first 3 months
>> x[
> +     , {
> +         # determine the end date as 3 months from the first date
> +         endDate <- seq(MYD[1L], by = '3 months', length = 2)[2L]
> +         # extract what is changing
> +         changes <- ATT_1[(MYD >= MYD[1L]) & (MYD <= endDate)]
> +         # now count the changes
> +         list(nChanges = sum(head(changes, -1L) != tail(changes, -1L)))
> +       }
> +     , by = CASE_ID
> +     ]
>   CASE_ID nChanges
> 1:   CB26A        5
> 2:   CB27A        2
> 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> 
> 
>> On Wed, Jul 30, 2014 at 3:08 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
>> Dear R-helpers,
>> 
>> I want to count the number of times ATT_1 has changed in a period of 3
>> months(can be 4months) from the first YEAR_MTH entry for a CASE_ID. So if
>> for a CASE_ID we have data only for two distinct YEAR_MTH, then all the
>> entries should be considered, otherwise only the relevant entries will be
>> considered for calculation.
>> E.g. if the first YEAR_MTH entry is 201304 then get the number of changes
>> till 201307(inclusive), similarly if the first YEAR_MTH entry is 201302
>> then get the number of changes till 201305.
>> 
>> Dataset
>> CASE_ID YEAR_MTH ATT_1
>> CB26A    201302         1
>> CB26A    201302         0
>> CB26A    201302         0
>> CB26A    201303         1
>> CB26A    201303         1
>> CB26A    201304         0
>> CB26A    201305         1
>> CB26A    201305         0
>> CB26A    201306         1
>> CB27A    201304         0
>> CB27A    201304         0
>> CB27A    201305         1
>> CB27A    201306         1
>> CB27A    201306         0
>> CB27A    201307         0
>> CB27A    201308         1
>> 
>> The final dataset should look like
>> 
>> ID_CASE    No.of changes
>> CB26A        5
>> CB27A        3
>> 
>> where 'No.of changes' refer to the change in 3 months (201302-201305 for
>> CB26A and 201304-201307 for CB27A).
>> 
>> How can this be done in R?
>> 
>> Regards,
>> Abhinaba Roy
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list