[R] Faster way to implement this search?

William Dunlap wdunlap at tibco.com
Fri Mar 16 18:31:04 CET 2012


You didn't show your complete code but the following may help you speed things up.
Compare a function, f0, structured like your code and one, f1, that calls sum once
instead of counting length(x)-3 times.

f0 <- function(x, test.pattern) {
    count <- 0
    for(indx in seq_len(length(x)-3)) {
       if ((x[indx] == test.pattern[1]) && (x[indx+1] == test.pattern[2]) && (x[indx+2] == test.pattern[3])) {
           count <- count + 1
       }
    }
    count
}

f1 <- function(x, test.pattern) {
    indx <- seq_len(length(x)-3)
    sum((x[indx] == test.pattern[1]) & (x[indx+1] == test.pattern[2]) & (x[indx+2] == test.pattern[3]))
}


> bin.05 <- round((log10(1:10000000)%%1e-3 - log10(1:10000000)%%1e-4) * 1e4) # quasi-random sample of 10^7 from {0,...,9}
> system.time(print(f0(bin.05, c(2,3,3))))
[1] 3194
   user  system elapsed 
  14.35    0.00   14.35 
> system.time(print(f1(bin.05, c(2,3,3))))
[1] 3194
   user  system elapsed 
   0.70    0.21    0.90

You are probably also slowing things down by doing
    yourList$yourCounts[1] <- yourList$yourCounts[1] + 1
many times instead of
   count <- yourList$yourCounts[1]
once and
   count <- count + 1
many times.  The former evaluates $, [, $<-, and [<- many
times and the $<- and [<- in particular may use a fair bit of time.
   

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Walter Anderson
> Sent: Friday, March 16, 2012 10:00 AM
> To: R Help
> Subject: [R] Faster way to implement this search?
> 
> I am working on a simulation where I need to count the number of matches
> for an arbitrary pattern in a large sequence of binomial factors.  My
> current code is
> 
>      for(indx in 1:(length(bin.05)-3))
>        if ((bin.05[indx] == test.pattern[1]) && (bin.05[indx+1] ==
> test.pattern[2]) && (bin.05[indx+2] == test.pattern[3]))
>          return.values$count.match.pattern[1] =
> return.values$count.match.pattern[1] + 1
> 
> Since I am running the above code for each simulation multiple times on
> sequences of 10,000,000 factors the code is taking longer than I would
> like.   Is there a better (more "R" way of achieving the same answer?
> 
> Walter Anderson
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list