[Rd] Regular expressions & large strings (PR#6617)

Mark White mjw at celos.net
Sat Feb 28 16:14:18 MET 2004


Prof Brian Ripley writes:
> I was able to confirm the error on RH8.0 Linux and the segfault on 
> Windows.
> 
> Note that PCRE is not being used, and if you add perl=TRUE to your [g]sub 
> calls you get correct results extremely fast.

Thanks for clarifying that; I hadn't realised.

> The segfault is occurring in regexec, that is in the GNU regex code 
> included in R.  I am not clear it is worth spending any time on trying to 
> find the problem in that code as
> 
> - you can use perl=TRUE as an alternative
> - we will be replacing the GNU regex code in due course to cope with 
> internationalization issues.

Sounds fine.  Do you think either of the following are worth
doing in the meantime?

  - Add an strsplit() variant with PCRE (perhaps this
    problem is be related to PR#6601; and the speed might be
    nice anyway).

  - Add options(pcre) so the potentially bad code can be
    avoided without explicitly setting perl=TRUE every time.

Mark <><



More information about the R-devel mailing list