[R] Regular Expressions

Gabor Grothendieck ggrothendieck at gmail.com
Tue May 13 15:23:38 CEST 2008


On Tue, May 13, 2008 at 5:02 AM, Shubha Vishwanath Karanth
<shubhak at ambaresearch.com> wrote:
> Suppose,
>
> S=c("World_is_beautiful", "one_two_three_four","My_book")
>
> I need to extract the last but one element of the strings. So, my output should look like:
>
> Ans=c("is","three","My")
>
> gsub() can do this...but wondering how do I give the regular expression....
>

As others have mentioned strsplit is probably easier in this case but it can
be done with a regular expression as shown below where [^_]+ matches a
any string of characters not containing _ :

> re <- "^([^_]+_)*([^_]+)_([^_]+)$"
> gsub(re, "\\2", S)
[1] "is"    "three" "My"

The strapply function in the gsubfn package can also be used.
out below has the same value as strsplit(S, "_"):

library(gsubfn)
out <- strapply(S, "[^_]+")
sapply(out, function(x) tail(x, 2)[1])



More information about the R-help mailing list