[R] gsub regexp question
skiadas at hanover.edu
Sat Jan 27 22:34:39 CET 2007
On Jan 27, 2007, at 3:41 PM, Phillimore, Albert wrote:
> Dear R Users,
> I am trying to users gsub to remove multiple cases of square
> brackets and their different contents in a character string. A
> sample of such a string is shown below. However, I am having great
> difficulty understanding regexp syntax. Any help is greatly
> "tree STATE_286000 [&lnP=-12708.453945423369] = [&R] ((((((15
Is this what you want? I tend to prefer perl regular expressions:
> str <- "tree STATE_286000 [&lnP=-12708.453945423369] = [&R]
> gsub("\\[[^\\]]+\\]","",str, perl=T)
 "tree STATE_286000 =
As an explanation, \\[ and \\] match the two square brackets you
want. We need to escape the brackets with the backslashes because
they have a special meaning in perl regular expressions.
In perl regexps, "[....]" stands for "match a single character that
is like what we have in the .... For instance [ab] will match an a or
a b. [a-z] will match all lowercase characters. A ^ as a first
character in there means "match all but what follows". for instance
[^a-z] means match anything but lowercase characters. So [^\\]] means
match any character but a closing bracket.
Finally the plus sign afterwards means: match at least one. So [^\\]]
+ means "match any sequence of characters that does not contain a
closing bracket. So the whole thing now matches an opening bracket,
followed by all characters until a corresponding closing bracket.
This will not work if you have nested pairs of brackets, [like [so]].
That is a tad more delicate, and we can discuss it if you really need
to deal with it.
More information about the R-help