[R] Regular expression to define contents between parentheses

Alexander Shenkin ashenkin at ufl.edu
Tue Aug 25 22:48:25 CEST 2009


Hi Judith,

This probably isn't the only way to do it, but:

    gsub("\\(.*?\\)", "", myvector, perl=TRUE)

seems to do the trick.

The problem is that regular expressions are greedy, so you were matching
everything between the first and last parens, as you noticed.  Putting
the question mark there makes it a "minimal" matching operation. 
Apparently this is only implemented in perl regex's, or at least in that
syntax.  Hence the 'perl=TRUE'.

hth,
allie

On 8/25/2009 3:17 PM, Judith Flores wrote:
> Hello dear R-helpers,
>
>    I haven't been able to figure out of find a solution in the R-help archives about how to delete all the characters contained in groups of parenthesis. I have a vector that looks more or less like this:
>
> myvector<-c("something (80 km/h, sd) & more (6 kg/L,sd)", "somethingelse (48 m/s, sd) & moretoo (50g/L , sd)")
>
> I want to extract all the strings that are not contained in parenthesis, the goal would be to obtain the following new vector:
>
> subvector<-c("something & more", "somethingelse & moretoo")
>
> I tried the following, but this pattern seems to enclose all that is included between the first opened parenthesis and the last closed parethesis, which makes sense, but it's not what I need:
>
> subvector<-gsub("\\((.*)\\)","",myvector
>
>
> Your help will be very appreciated.
>
> Thank you,
>
> Judith
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list