[R] regular expression question

Dirk Eddelbuettel edd at debian.org
Sun Jun 11 21:47:47 CEST 2006


On 11 June 2006 at 14:25, markleeds at verizon.net wrote:
| i have variables that are of type character but
| they have number characters at the end. for example :
| 
| "AAL123"
| "XELB245"
| "A247"
| 
| I want a command that gives me just gives me the letter characters
| for each one. 
| the letter characters always start first and then the number characters come second and it never flips back to letter characters
| once the number characters start. i am using R-2.20 on
| windows Xp. Thanks. substring doesn't work because the
| length of the letter characters can vary.


> gsub("(\\d*)$","",c("AAL123", "XELB245", "A247", "FOO123BAR"), perl=TRUE)
[1] "AAL"       "XELB"      "A"         "FOO123BAR"
> 

gsub finds what is described by the first regexp [ here (\\d\*)$ --- any
sequence of digits before the end-of-line ] and applies the second regexp 
[ here an empty string as we simply delete ] to the third argument.

Note 
 - how the $ symbol $ \b prevents it from eating the non-final digits
   in the counter example FOO123BAR
 - how the \d for digits needs escaped backslashes \\d
 - how the * char denotes '1 or more of the preceding thingie'

Hth, Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison



More information about the R-help mailing list