[R] regexp inside and outside brackets

Marc Schwartz marc_schwartz at me.com
Fri Dec 11 15:39:09 CET 2015


> On Dec 11, 2015, at 7:50 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:
> 
> For the regexp aficionados, out there:
> 
> I need a regular expression to extract either everything within some
> brackets, or everything outside the brackets, in a string.
> 
> This would be the test string:
> "A1{0}~B0{1} CO{a2}NN{12}"
> 
> Everything outside the brackets would be:
> 
> "A1 ~B0 CO NN"
> 
> and everything inside the brackets would be:
> 
> "0 1 a2 12"
> 
> I have a working solution involving strsplit(), but I wonder if there is a
> more direct way.
> Thanks in advance for any hint,
> Adrian


x <- "A1{0}~B0{1} CO{a2}NN{12}"

The first is a bit easier:

> gsub("\\{[[:alnum:]]*\\}", " ", x)
[1] "A1 ~B0  CO NN "


The second, at least using standard functions, is a bit more cumbersome, given the repeated sequences:

> gsub("\\{|\\}", "", regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]])
[1] "0"  "1"  "a2" "12"

Note that a multi-element vector is returned.

In the above:

> gregexpr("\\{[[:alnum:]]+\\}", x)
[[1]]
[1]  3  9 15 21
attr(,"match.length")
[1] 3 3 4 4
attr(,"useBytes")
[1] TRUE

returns the starting positions of the matches, which are passed to regmatches() to get the actual values:

> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))
[[1]]
[1] "{0}"  "{1}"  "{a2}" "{12}"

The gsub() replaces the returned braces.

You could invert the result of regmatches() to get:

> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x), invert = TRUE)[[1]]
[1] "A1"  "~B0" " CO" "NN"  ""   


Of course, this presumes non-nesting of braces, etc.

Regards,

Marc Schwartz



More information about the R-help mailing list