[R] Faster way to implement this search?

William Dunlap wdunlap at tibco.com
Sun Mar 18 21:48:12 CET 2012


> My current question is there a way to perform the same count, but with
> an arbitrary size pattern.  In other words, instead of a fixed pattern
> size of 3, could I have a pattern size of 4, 5, 6, ..., 30 any of which
> that could be run without changing the script?

Of course you cannot do this without changing your script.  However,
if you make a function out of it then you can change the function definition
to be more flexible and not have to change any calls to it.

Change your function from
  f <- function(x, test.pattern) {
      indx <- seq_len(length(x)-3) # 3 should be 2
      sum((x[indx] == test.pattern[1]) & (x[indx+1] == test.pattern[2]) & (x[indx+2] == test.pattern[3]))
  }
to
f <- function (x, test.pattern)  {
   if (length(x)  < length(test.pattern)) {
      0 # degenerate cases
   } else {
        indx <- seq_len(length(x) - length(test.pattern) + 1)
        match <- x[indx] == test.pattern[1]
        for (i in seq_len(length(test.pattern) - 1)) {
            match <- match & x[indx + i] == test.pattern[1 + i]
        }
        sum(match)
    }
}
Give the function a name that is meaningful and memorable to you
and use it instead of copying the idiom in it when you need to do a search.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Walter Anderson
> Sent: Saturday, March 17, 2012 5:56 AM
> To: Jeff Newmiller
> Cc: R Help
> Subject: Re: [R] Faster way to implement this search?
> 
> On 03/17/2012 12:53 AM, Jeff Newmiller wrote:
> >      for(indx in 1:(length(bin.05)-3))
> > >>>         if ((bin.05[indx] == test.pattern[1])&&   (bin.05[indx+1] ==
> > >>>  test.pattern[2])&&   (bin.05[indx+2] == test.pattern[3]))
> > >>>           return.values$count.match.pattern[1] =
> > >>>  return.values$count.match.pattern[1] + 1
> Ok, sorry for not understanding the first time, here is my example with
> the type of data I am working with in this simulation
> 
>       test.pattern <- c("T", "T", "O")
>       bin.05 cut(runif(10000000), breaks=c(-0.01,0.05,1), labels=c("T",
> "O"))
>       for(indx in 1:(length(bin.05)-3))
>          if (
>              (bin.05[indx] == test.pattern[1]) &&
>              (bin.05[indx+1] == test.pattern[2]) &&
>              (bin.05[indx+2] == test.pattern[3]))
>                  count <- count + 1
> 
> Now the approach provided by William Dunlop sped up my simulation
> tremendously;
> 
> indx <- seq_len(length(bin.05)-3)
> count <- sum((bin.05[indx] == test.pattern[1]) &
>                         (bin.05[indx+1] == test.pattern[2]) &
>                         (bin.05[indx+2] == test.pattern[3]))
> 
> My current question is there a way to perform the same count, but with
> an arbitrary size pattern.  In other words, instead of a fixed pattern
> size of 3, could I have a pattern size of 4, 5, 6, ..., 30 any of which
> that could be run without changing the script?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list