[R] Regex exercise

Bert Gunter gunter.berton at gene.com
Fri Aug 20 22:55:27 CEST 2010


For regular expression afficianados, I'd like a cleverer solution to
the following problem (my solution works just fine for my needs; I'm
just trying to improve my regex skills):

Given the string (entered, say, at a readline prompt):

 "1   2 -5, 3- 6 4  8 5-7 10"   ## only integers will be entered

parse it to produce the numeric vector:

c(1, 2, 3, 4, 5, 3, 4, 5, 6, 8, 5, 6, 7, 10)

Note that "-" in the expression is used to indicate a range of values
instead of ":"

Here's my UNclever solution:

First convert more than one space to a single space and then replace
"<any spaces>-<any spaces>" by ":" by:

>  x1 <- gsub(" *- *",":",gsub(" +"," ",resp))  #giving
> x1
[1] "1 2:5, 3:6 4 8 5:7 10"    ## Note that the comma remains

Next convert the single string into a character vector via strsplit by
splitting on anything but ":" or a digit:

> x2 <- strsplit(x1,split="[^:[:digit:]]+")[[1]]   #giving
> x2
[1] "1"    "2:5"  "3:6" "4"    "8"    "5:7"  "10"

Finally, parse() the vector, eval() each element, and unlist() the
resulting list of numeric vectors:

>  unlist(lapply(parse(text=x2),eval)) #giving, as desired,
 [1]  1  2  3  4  5  3  4  5  6  4  8  5  6  7 10


This seems far too clumsy and circumlocuitous not to have a more
elegant solution from a true regex expert.

(Special note to Thomas Lumley: This seems one of the few instances
where eval(parse..)) may actually be appropriate.)

Cheers to all,

Bert

-- 
Bert Gunter
Genentech Nonclinical Biostatistics



More information about the R-help mailing list