[R] Count matches of a sequence in a vector?

David Winsemius dwinsemius at comcast.net
Wed Apr 21 23:14:31 CEST 2010


On Apr 21, 2010, at 11:07 AM, Jeff Brown wrote:

> At April 21, 2010 10:16:10 AM EDT mieke posted to Nabble:

>> Hey there,
>>
>> I need to count the matches of a sequence seq=c(2,3,4) in a long  
>> vector
>> v=c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5,....).
>> With sum(v %in% seq) I only get the sum of sum(v %in% 2), sum(v %in 
>> % 3) and
>> sum(v %in% 4), but that's not what I need :(
>>
>
> This sort of calculation can't be vectorized; you'll have to iterate  
> through
> the sequence, e.g. with a "for" loop.  I don't know if a routine has  
> already
> been written.

A vectorized solution:

  vseq <-c(2,3,4)
  v <- c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5)
  sum( v[1:(length(v) -2)] == vseq[1] &
       v[2:(length(v) -1)] == vseq[2] &
       v[3:(length(v) )] == vseq[3]    )
# [1] 1

And a check on relative speed which was also a concern you expressed:

require(rbenchmark)
require(zoo)
logsum <- function(v,vseq) sum( v[1:(length(v) -2)] == vseq[1] &
       v[2:(length(v) -1)] == vseq[2] &
       v[3:(length(v) )] == vseq[3] )

lseq = length(vseq)
lv = length(v)
sumroll <- function(v,vseq) sum( rollapply(zoo(v), 3, function(x)  
all(x == vseq)) )

summatches <- function(v,vseq) sum( sapply(1:(lv-lseq 
+1),function(i)all(v[i:(i+lseq-1)] == vseq)) )


 > benchmark(
+    logsum(v, vseq),
+    summatches(v,vseq),
+    sumroll(v,vseq),
+    order=c('replications', 'elapsed'))
                  test replications elapsed relative user.self  
sys.self user.child sys.child
1     logsum(v, vseq)          100   0.002      1.0     0.003     
0.001          0         0
2 summatches(v, vseq)          100   0.016      8.0     0.016     
0.000          0         0
3    sumroll(v, vseq)          100   0.087     43.5     0.087     
0.001          0         0

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list