[Rd] Regular expressions & large strings (PR#6617)

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Feb 28 12:31:13 MET 2004


I was able to confirm the error on RH8.0 Linux and the segfault on 
Windows.

Note that PCRE is not being used, and if you add perl=TRUE to your [g]sub 
calls you get correct results extremely fast.

The segfault is occurring in regexec, that is in the GNU regex code 
included in R.  I am not clear it is worth spending any time on trying to 
find the problem in that code as

- you can use perl=TRUE as an alternative
- we will be replacing the GNU regex code in due course to cope with 
internationalization issues.

On Fri, 27 Feb 2004 mjw at celos.net wrote:

> A possible regex bug when working with large strings.  The
> following code snippet
> 
>   t5 <- paste( c( "# === TEST", rep(' ', 2452294) ), collapse='')
>   str( sub("^.*TEST", "xyz", t5) )
>   str( sub("^.*TEST", "xyz", substr(t5,0,200)) )
> 
> doesn't behave right; on one machine, the second and third
> lines print different results [the second line, on the long
> string, doesn't do the substitution], while on another, the
> second line causes a segfault.  Both are running R 1.8.1
> with PCRE, under NetBSD (1.6.1 and 1.6 respectively).
> 
> Possible related (although perhaps not a bug):
> 
>   function(n) {
>     line <- paste(as.character(trunc(runif(n)*100)),collapse=" ")
>     system.time( rep  <- gsub("[[:space:]]", "-", line) )
>   }
> 
> gives rather long times rising v sharply for big strings (eg
> 2.2s at n=2e4, 360s at n=2e5 on AMD 1.2GHz).  Other languages 
> aren't so slow on this task (eg n=2e5: 0.4s ruby 1.8.1, and
> 5.2s python 2).  Doubtless my extremely-quick-hack benchmarks
> aren't fair, but the difference still seems rather big.
> 
> Mark <><
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list