[R] Extracting text from a character string

Marc Schwartz marc_schwartz at comcast.net
Fri Mar 9 21:44:47 CET 2007


On Fri, 2007-03-09 at 15:23 -0500, Shawn Way wrote:
>  I have a set of character strings like below:
>  
> > data3[1]
> [1] "CB01_0171_03-27-2002-(Sample 26609)-(126)"
> > 
>  
> I am trying to extract the text 03-27-2002 and convert this into a date 
> for the same record.  I keep looking at the grep function, however I 
> cannot quite get it to work.
>  
> grep("\d\d-\d\d-\d\d\d\d",data3[1],perl=TRUE,value=TRUE)
>  
> Any hints?


At least two different ways:

Vec <- "CB01_0171_03-27-2002-(Sample 26609)-(126)"


1. Using substr(), if your source vector is a fixed format

# Get the 11th thru the 20th character
> substr(Vec, 11, 20)
[1] "03-27-2002"


2. Using sub() for a more generalized approach:

# Use a back reference, returning the value pattern within the 
# parens

> sub(".+([0-9]{2}-[0-9]{2}-[0-9]{4}).+", "\\1", Vec)
[1] "03-27-2002"


See ?substr, ?sub and ?regex

HTH,

Marc Schwartz



More information about the R-help mailing list