[R] Creating a new by variable in a dataframe

William Dunlap wdunlap at tibco.com
Fri Oct 19 20:51:17 CEST 2012


Suppose your data frame is
d <- data.frame(
     stringsAsFactors = FALSE,
     transaction = c("T01", "T02", "T03", "T04", "T05", "T06", 
        "T07", "T08", "T09", "T10"),
     date = c("2012-10-19", "2012-10-19", "2012-10-19", 
        "2012-10-19", "2012-10-22", "2012-10-23", 
        "2012-10-23", "2012-10-23", "2012-10-23", 
        "2012-10-23"),
     time = c("08:00", "09:00", "10:00", "11:00", "12:00", 
        "13:00", "14:00", "15:00", "16:00", "17:00"
        ))
(Convert the date and time to your favorite classes, it doesn't matter here.)

A general way to say if an item is the last of its group is:
  isLastInGroup <- function(...)  ave(logical(length(..1)), ..., FUN=function(x)seq_along(x)==length(x))
  is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for large
datasets by using
  isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
  is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
  > cbind(d, is_last_of_dayA, is_last_of_dayB)
     transaction       date  time is_last_of_dayA is_last_of_dayB
  1          T01 2012-10-19 08:00           FALSE           FALSE
  2          T02 2012-10-19 09:00           FALSE           FALSE
  3          T03 2012-10-19 10:00           FALSE           FALSE
  4          T04 2012-10-19 11:00            TRUE            TRUE
  5          T05 2012-10-22 12:00            TRUE            TRUE
  6          T06 2012-10-23 13:00           FALSE           FALSE
  7          T07 2012-10-23 14:00           FALSE           FALSE
  8          T08 2012-10-23 15:00           FALSE           FALSE
  9          T09 2012-10-23 16:00           FALSE           FALSE
  10         T10 2012-10-23 17:00            TRUE            TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of ramoss
> Sent: Friday, October 19, 2012 10:52 AM
> To: r-help at r-project.org
> Subject: [R] Creating a new by variable in a dataframe
> 
> Hello,
> 
> I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
> time(event_tim).
> How could I create a 4th variable (last_trans) that would flag the last
> transaction of the day for each day?
> In SAS I use:
> proc sort data=all6;
> by tdate event_tim;
> run;
>          /*Create last transaction flag per day*/
> data all6;
>   set all6;
>   by tdate event_tim;
>   last_trans=last.tdate;
> 
> Thanks ahead for any suggestions.
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-
> variable-in-a-dataframe-tp4646782.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list