[R] matching a sequence in a vector?

Petr Savicky savicky at cs.cas.cz
Wed Feb 15 09:15:31 CET 2012


On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
> Hi All,
> 
> 
> I've been trawling through the documentation and listserv archives on this topic -- but
> as yet have not found a solution.  I'm sure this is pretty simple with R, but I cannot work out how without
> resorting to ugly nested loops.
> 
> As far as I can tell, grep, match, and %in% are not the correct tools.
> 
> Question:
> given these vectors --
> patrn <- c(1,2,3,4)
> exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
> 
> how do I get the desired answer by finding the occurence of the pattern and returning the starting indices:
> 6, 13, 23

Hi.

A more efficient version of the previous suggestion
is as follows.

  m <- length(patrn)
  n <- length(exmpl)
  candidate <- seq.int(length=n-m+1)
  for (i in seq.int(length=m)) {
      candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
  }
  candidate

  [1]  6 13 23

In this solution, the set of candidate indices decreases. If
the prefixes of the searched pattern are rare, the set of
candidates is reduced in a few iterations and the remaining
iterations become faster.

Hope this helps.

Petr Savicky.



More information about the R-help mailing list