# [R] Bus stop sequence matching problem

David McPearson dmcp at webmail.co.za
Sat Aug 30 09:54:02 CEST 2014

```Homework? The list has a no homework policy - but perhaps I'll be forgiven por
posting hints.
In general terms, this is how I appraoched the problem:
* Loop through the rows of stop_onoff - for (idx in ...someething...) {...
* For each row, find the first of "ref" in a suitably filtered subset of
stop_sequence, and keep track of these row numbers
* Update columns "on" and "off"
* Use cumsum to calculate the number of passengers on the bus

Note the loop. Someone cleverer than I might be able to vectorise that step,
but I couldn't see how.

By the way, if this is homework...

Are you sure you're desired_output is correct? I would expect someething like
1  10   A  5   0    5
2  20   B  0   0    5
3  30   C  0   0    5
4  40   D  0   2    3
5  50   B 10   2   11
6  60   A  0   6    5

Are you aware that you're "ref" ccolumns are factors and not characters? If
you use "stringsAsFactors = FALSE" or
stop_onoff <-
data.frame(ref=factor(c('A','D','B','A'), levels =
levels(stop_sequence\$ref)),on=c(5,0,10,0),off=c(0,2,2,6))
it will simplify your'e analysis (or at least reduce some typing).

Type the following in an R console
?data.frame
?factor

Now, if this ain't homework, or you just want someone to do it for you, e-mail
me offline and I'll send you my appraoch. If it is homework, let me know - I'm
happy to help anyway, but I will be trying to help you solve this for
yourself.

Cheers,
DMcP

On Sat, 30 Aug 2014 12:46:17 +1200 Adam Lawrence <alaw005 at gmail.com> wrote

> I am hoping someone can help me with a bus stop sequencing problem in R,
> where I need to match counts of people getting on and off a bus to the
> correct stop in the bus route stop sequence. I have tried looking
> online/forums for sequence matching but seems to refer to numeric sequences
> or DNA matching and over my head. I am after a simple example if anyone can
>
> I have two data series as per below (from database), that I want to
> combine. In this example “stop_sequence” includes the equence (seq) of
bus
> stops and “stop_onoff” is a count of people getting on and off at
certain
> stops (there is no entry if noone gets on or off).
>
> stop_sequence <- data.frame(seq=c(10,20,30,40,50,60),
> ref=c('A','B','C','D','B','A'))
> ##   seq ref
> ## 1  10   A
> ## 2  20   B
> ## 3  30   C
> ## 4  40   D
> ## 5  50   B
> ## 6  60   A
> stop_onoff <-
> data.frame(ref=c('A','D','B','A'),on=c(5,0,10,0),off=c(0,2,2,6))
> ##   ref on off
> ## 1   A  5   0
> ## 2   D  0   2
> ## 3   B 10   2
> ## 4   A  0   6
>
> I need to match the stop_onoff numbers in the right sto sequence, with the
> correctly matched output as follows (load is a cumulative count of on and
> off)
>
> desired_output <- data.frame(seq=c(10,20,30,40,50,60),
> ref=c('A','B','C','D','B','A'),
> ##   seq ref on off load
> ## 1  10   A  5   0    5
> ## 2  20   B  -   -    0
> ## 3  30   C  -   -    0
> ## 4  40   D  0   2    3
> ## 5  50   B 10   2   11
> ## 6  60   A  0   6    5
>
> In this example the stop “B” is matched to the second stop “B” in
the stop
> sequence and not the first because the onoff data is after stop “D”.
>
> Any guidance much appreciated.
>
> Regards
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

____________________________________________________________
South Africas premier free email service - www.webmail.co.za

Cheapest Insurance Quotes!
https://www.outsurance.co.za/insurance-quote/personal/?source=msn&cr=Postit14_468x60_gif&cid=322

```