[R] Regular Expressions

Gabor Grothendieck ggrothendieck at gmail.com
Fri Nov 5 19:04:48 CET 2010


2010/11/5 Brian Diggs <diggsb at ohsu.edu>:
> Is there a standard, built in way to get both (all) backreferences at the
> same time with just one call to sub (or the appropriate function)? I can
> cobble something together specifically for 2 backreferences (not extensively
> tested):
>
> both_backrefs <- function(pattern, x) {
>        s <- sub(pattern, "\\1\034\\2", x)
>        matrix(unlist(strsplit(s,"\034")), ncol=2, byrow=TRUE)
> }
>
> both_backrefs(regex, x)
>
> However, putting the parts back together into a string (with a delimiter
> that hopefully won't be in the string otherwise) just to use strsplit to
> pull them apart seems inelegant (as does making multiple calls to sub()).
>  sub() (and siblings) surely already have the backreferences as strings at
> some point in the processing, but I don't see a way to return them as a
> vector or matrix, only to substitute using backreferences (sub) or return
> indicies pointing to where the matches start (regexpr) or return the whole
> string matches (grep with value=TRUE).
>

The gsubfn package has gsubfn which is like gsub except it can take a
function in place of the replacement string.  The function's arguments
are match or the back references and the function's output replaces
the match.    Also it has strapply which will does the same thing
except instead of inserting the function's output it returns the
function's output.  See the home page at http://gsubfn.googlecode.com

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list