[R] gregexpr slow and increases exponentially with string length --> how to speed it up?

Emmanuel Levy emmanuel.levy at gmail.com
Fri Oct 31 02:01:04 CET 2008


Dear All,

I have a long string and need to search for regular expressions in
there. However it becomes horribly slow as the string length
increases.

Below is an example: when "i" increases by 5, the time spent increases
by more! (my string is 11,000,000 letters long!)

I also noticed that
- the search time increases dramatically with the number of matches found.
- the perl=T option slows down the search

Any idea to speed this up would be greatly appreciated!

Best,

Emmanuel


> for (i in c(10000, 50000, 100000, 500000)){
+   aa = as.character(sample(1:9, i, replace=T))
+   aa = paste(aa, collapse='')
+   print(i)
+   print(system.time(gregexpr("[367]2[1-9][129]",aa)))
+ }
[1] 10000
   user  system elapsed
  0.004   0.000   0.003
[1] 50000
   user  system elapsed
  0.060   0.000   0.061
[1] 1e+05
   user  system elapsed
  0.240   0.000   0.238
[1] 5e+05
   user  system elapsed
  5.733   0.000   5.732
>



More information about the R-help mailing list