[R] Faster way to implement this search?

Walter Anderson wandrson01 at gmail.com
Fri Mar 16 23:41:56 CET 2012


On 03/16/2012 12:31 PM, William Dunlap wrote:
> You didn't show your complete code but the following may help you speed things up.
> Compare a function, f0, structured like your code and one, f1, that calls sum once
> instead of counting length(x)-3 times.
>
> f0<- function(x, test.pattern) {
>      count<- 0
>      for(indx in seq_len(length(x)-3)) {
>         if ((x[indx] == test.pattern[1])&&  (x[indx+1] == test.pattern[2])&&  (x[indx+2] == test.pattern[3])) {
>             count<- count + 1
>         }
>      }
>      count
> }
>
> f1<- function(x, test.pattern) {
>      indx<- seq_len(length(x)-3)
>      sum((x[indx] == test.pattern[1])&  (x[indx+1] == test.pattern[2])&  (x[indx+2] == test.pattern[3]))
> }
>
>
>> bin.05<- round((log10(1:10000000)%%1e-3 - log10(1:10000000)%%1e-4) * 1e4) # quasi-random sample of 10^7 from {0,...,9}
>> system.time(print(f0(bin.05, c(2,3,3))))
> [1] 3194
>     user  system elapsed
>    14.35    0.00   14.35
>> system.time(print(f1(bin.05, c(2,3,3))))
> [1] 3194
>     user  system elapsed
>     0.70    0.21    0.90
>
> You are probably also slowing things down by doing
>      yourList$yourCounts[1]<- yourList$yourCounts[1] + 1
> many times instead of
>     count<- yourList$yourCounts[1]
> once and
>     count<- count + 1
> many times.  The former evaluates $, [, $<-, and [<- many
> times and the $<- and [<- in particular may use a fair bit of time.
>
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of Walter Anderson
>> Sent: Friday, March 16, 2012 10:00 AM
>> To: R Help
>> Subject: [R] Faster way to implement this search?
>>
>> I am working on a simulation where I need to count the number of matches
>> for an arbitrary pattern in a large sequence of binomial factors.  My
>> current code is
>>
>>       for(indx in 1:(length(bin.05)-3))
>>         if ((bin.05[indx] == test.pattern[1])&&  (bin.05[indx+1] ==
>> test.pattern[2])&&  (bin.05[indx+2] == test.pattern[3]))
>>           return.values$count.match.pattern[1] =
>> return.values$count.match.pattern[1] + 1
>>
>> Since I am running the above code for each simulation multiple times on
>> sequences of 10,000,000 factors the code is taking longer than I would
>> like.   Is there a better (more "R" way of achieving the same answer?
>>
>> Walter Anderson
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
Thank you for this response.  That made a huge speed improvement in my 
simulation speed!



More information about the R-help mailing list