[Rd] Versions of PCRE, documenting what grep etc do.

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Oct 25 10:57:39 MEST 2003

I have added a preliminary help page for regex to R-patched which should
help for now, and added a configure test for PCRE >= 4.0 to R-devel.

I will return to this later in 2003.


On Fri, 24 Oct 2003, Kurt Hornik wrote:

> >>>>> Prof Brian Ripley writes:
> > A couple of weeks back there was some discussion about documenting the
> > regular expressions as used in R.  Several years ago the problem was
> > that this was OS-dependent, and to plug that problem we incorporated
> > regexp code from a version of GNU grep, later updated to grep-2.4.2 in
> > R 1.2.0.
> > I have been looking at documenting what grep(perl=TRUE) does, and we
> > have a similar problem in that the current PCRE, 4.4, implements
> > rather more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS
> > does not supply it, and RH8.0 has PCRE 3.9. Whichever version of
> > Debian is on franz has PCRE 3.4).
> > I could add a configure check for PCRE >= 4.0, and I think probably
> > should do that.  However, my inclination is to always use the version
> > of PCRE in the R sources and thereby ensure that all builds of R have
> > the same version, the one I will document.  Comments, please.
> I think we should in any case allow maintainers of binary packages on
> platforms with advanced package management systems to force the use of
> shared libraries the system can provide.  (So the binary maintainers
> would need to verify that the system package provides the right libs and
> headers.)
> Not sure about the default: we typically try to use available system
> resources, unless this is bound to cause problems, and regex was of the
> latter type, afaicr.  
> > For PCRE 4.4 there is a long man page that I will use as a basis for
> > the documentation.  I am inclined just to include either a text or PDF
> > version of the man page -- any preferences for which form?
> Depends on where you would put the docs, I think.  Btw, where can 4.4 be
> found?
> > For the non-Perl regexps it is harder, as I am unsure exactly what
> > patterns the GNU regex we have accepts.  (From a problem which
> > occurred with some Sweave regexps, I think it accepts more than it is
> > intended to.)  One fairly good docu source is the GNU grep man page:
> > does anyone know a better one?  I had thought of writing a regexp.Rd
> > help page to which grep.Rd could refer.
> That would be great.  Linux has a regex(7) purported to be "taken from
> Henry Spencer's regex package", which might be used as a start.  The old
> GNU regex .tar.gz has a texinfo file, but does not help for what we
> need, I think.
> [I recently looked for available regexp docs, but was not too
> successful.]
> > None of this is imminent (I am too busy) but is intended for the next 
> > minor release (which may be called 1.9.0 or 2.0.0, I gather).
> Too bad :-(
> Best
> -k

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-devel mailing list