[R] Extract Element of String with R's Regex

Stephen Tucker brown_emu at yahoo.com
Fri Aug 1 13:31:54 CEST 2008

In the example below, a straight application of strsplit() is probably the simplest solution. In a more general case where it may be desirable to match patterns, a combination of sub() or gsub() with strsplit() might do the trick:

> x <- "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669 -965.35"
> patt <- "Best-K Gene \\d+ (\\w+) (\\w+) Noc= \\d - (\\d) LL= (.*)"

> unlist(strsplit(gsub(patt,"\\1,\\2,\\3",x,perl=TRUE),","))
[1] "211952_at" "RANBP5"    "2"  

Alternatively, you may want to take a look at the gsubfn package - it is quite useful. Still learning to use it myself...

> library(gsubfn)
> unlist(strapply(x,patt,function(x1,x2,x3) c(x1,x2,x3),backref=-3,perl=TRUE))
[1] "211952_at" "RANBP5"    "2"  

----- Original Message ----
From: Simon Blomberg <s.blomberg1 at uq.edu.au>
To: Edward Wijaya <ewijaya at gmail.com>
Cc: r-help at r-project.org
Sent: Thursday, July 31, 2008 11:48:23 PM
Subject: Re: [R] Extract Element of String with R's Regex

How about:

unlist(strsplit(x, split=" "))[c(4:5,10)]

That perl script looks like a good reason to avoid perl.


On Fri, 2008-08-01 at 15:13 +0900, Edward Wijaya wrote:
> Hi,
> I have this string, in which I want to extract some of it's element:
> > x <- "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL= -963.669 -965.35"
> yielding this array
> [1] "211952_at"  "RANBP5" "2"
> In Perl we would do it this way:
> __BEGIN__
> my @needed =();
> my $str = "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL=
> -963.669 -965.35";
> $str =~ /Best-K Gene \d+ (\w+) (\w+) Noc= \d - (\d) LL= (.*)/;
> push @needed, ($1,$2,$3);
> __END___
> How can we achieve this with R?
>  - E.W.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
Faculty of Biological and Chemical Sciences 
The University of Queensland 
St. Lucia Queensland 4072 
Room 320 Goddard Building (8)
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au

1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer can 
be extracted from a given body of data. - John Tukey.

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list