[R] run a calculation function over time fields, ordered and grouped by variables

gavinr g.rudge at bham.ac.uk
Mon May 25 21:28:55 CEST 2015

I’ve got some transit data relating to bus stops for a GIS data set.  Each
row represents one stop on a route.  For each record I have the start time
of the route, a sequence in which a bus stops, the time the bus arrives at
the first stop and the time taken to get to each of the stops from the last
one in the sequence.  Not all sequences of stops starts with the number 1,
some may start with a higher number.
I need to make a new variable which has the time the bus arrives at each
stop by using the start time from the stop with the lowest sequence number,
to populate all of the arrival times for each stop in each route. 

I have a very simple example below with just three routes and a few stops in
each.  My actual data set has a few million rows.  I've also created a
version of the data set I'm aiming to get.

There are two problems here.  Firstly getting the data into the correct
format to do the calculations with 
durations, and secondly running a function over the data set to obtain the
It is the durations that are critical not the date, so using the POSIX
methods doesn’t really seem appropriate here.  Ultimately the times are
going to be used in a route solver in an ArcSDE geodatabase.  I tried to use
strptime to format my times, but could not get them into a data.frame as
presumably they are a list.  In this example I’ve left them as strings. 

Any help is much appreciated.

#create four columns with route id, stop sequence interval time and route
start time

#correct data set should look like this

View this message in context: http://r.789695.n4.nabble.com/run-a-calculation-function-over-time-fields-ordered-and-grouped-by-variables-tp4707655.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list