[R] Regex exercise

Greg Snow Greg.Snow at imail.org
Mon Aug 23 22:21:36 CEST 2010


How about:

x <- "1  2 -5, 3- 6 4  8 5-7 10"; x

library(gsubfn)

strapply( x, '(([0-9]+) *- *([0-9]+))|([0-9]+)', 
	function(one,two,three,four) {
		if( nchar(four) > 0 ) return(as.numeric(four) )
		return( seq( from=as.numeric(two), to=as.numeric(three) ) )
	}
)[[1]]



If x is a vector of strings and you remove the [[1]] then you will get a list with each element corresponding to a string in x (unlisting will give a single vector).

This could be easily extended to handle floating point numbers instead of just integers and even negative numbers (as long as you have a clear rule to distinguish between a negative and a the end of the range).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Bert Gunter
> Sent: Friday, August 20, 2010 2:55 PM
> To: r-help at r-project.org
> Subject: [R] Regex exercise
> 
> For regular expression afficianados, I'd like a cleverer solution to
> the following problem (my solution works just fine for my needs; I'm
> just trying to improve my regex skills):
> 
> Given the string (entered, say, at a readline prompt):
> 
>  "1   2 -5, 3- 6 4  8 5-7 10"   ## only integers will be entered
> 
> parse it to produce the numeric vector:
> 
> c(1, 2, 3, 4, 5, 3, 4, 5, 6, 8, 5, 6, 7, 10)
> 
> Note that "-" in the expression is used to indicate a range of values
> instead of ":"
> 
> Here's my UNclever solution:
> 
> First convert more than one space to a single space and then replace
> "<any spaces>-<any spaces>" by ":" by:
> 
> >  x1 <- gsub(" *- *",":",gsub(" +"," ",resp))  #giving
> > x1
> [1] "1 2:5, 3:6 4 8 5:7 10"    ## Note that the comma remains
> 
> Next convert the single string into a character vector via strsplit by
> splitting on anything but ":" or a digit:
> 
> > x2 <- strsplit(x1,split="[^:[:digit:]]+")[[1]]   #giving
> > x2
> [1] "1"    "2:5"  "3:6" "4"    "8"    "5:7"  "10"
> 
> Finally, parse() the vector, eval() each element, and unlist() the
> resulting list of numeric vectors:
> 
> >  unlist(lapply(parse(text=x2),eval)) #giving, as desired,
>  [1]  1  2  3  4  5  3  4  5  6  4  8  5  6  7 10
> 
> 
> This seems far too clumsy and circumlocuitous not to have a more
> elegant solution from a true regex expert.
> 
> (Special note to Thomas Lumley: This seems one of the few instances
> where eval(parse..)) may actually be appropriate.)
> 
> Cheers to all,
> 
> Bert
> 
> --
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list