[R] how to implement string pattern extraction in R

William Dunlap wdunlap at tibco.com
Mon Aug 23 02:40:05 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Waverley @ 
> Palo Alto
> Sent: Sunday, August 22, 2010 3:51 PM
> To: r-help
> Subject: Re: [R] how to implement string pattern extraction in R
> 
> Thanks for the reply to pointing me to the grep functions.
> 
> I have checked the readme page
> http://pbil.univ-lyon1.fr/library/base/html/grep.html before I sent
> the help request.
> 
> Just don't know how to extract a substring matching a pattern out of a
> string.  Can someone give me the example code similar to that in perl
> to extract the prefix out of the string.

The S language pattern matching functions are vectorized so
let's compare the S way to the vectorized version of your perl code.
I think the following is idiomatic perl:
  @x=qw(AAAA.txt BBBB.qaz CCCC.txt);
  @prefixes=map { if($_ =~ /(.*?)\.txt/) { $1 ; } else { "<not txt file>"; } } @x ;
  print( join(", ", @prefixes), "\n") ;
  ^Z     # or ^D on Unix
  AAAA, <not txt file>, CCCC
The S equivalent to the @x=qw(...) would be
  > x <- c("AAAA.txt", "BBBB.qaz", "CCCC.txt")
and to get the part before the ".txt", if there is a ".txt" at
the end you could do one of
  > ifelse(grepl("\\.txt$", x),
             sub(pattern="\\.txt$",replacement="",x),
             "<not txt file>")
  [1] "AAAA"           "<not txt file>" "CCCC"
or
  > ifelse((r <- regexpr("\\.txt$", x))>0,
           substring(x, 1, attr(r, "match.length")),
           "<not txt file>")
  [1] "AAAA"           "<not txt file>" "CCCC"   

perl's =~ has a return value that says if there was a match
or not and it stores the details of the match in the magic
variables $1, $2, ... (and $', $`, and $&).  S language
functions don't use magic variables but can store the
extra stuff as attributes of the return value.

(The above use core R or S+ functions.  The gsubfn package
offers more possibilities.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com         
        
> 
> Thanks much.
> 
> On Sun, Aug 22, 2010 at 3:05 PM, Waverley @ Palo Alto
> <waverley.paloalto at gmail.com> wrote:
> > Hi,
> >
> > In perl, to get a substring matching a particular  pattern can be
> > implemented like the following example:
> >
> > $x = "AAAA.txt";
> > if ($x=~ /(.*?)\.txt/){
> >  $prefix = $1;
> > }
> >
> > So how to do the same thing in R?
> >
> > Can someone provide me the code sample?
> >
> > Thanks much in advance.
> >
> > --
> > Waverley @ Palo Alto
> >
> 
> 
> 
> -- 
> Waverley @ Palo Alto
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list