[R] Generalizing a regex for retrieving numbers with and without scientific notation

Marc Schwartz marc_schwartz at me.com
Wed Feb 19 19:38:33 CET 2014

On Feb 19, 2014, at 12:26 PM, Morway, Eric <emorway at usgs.gov> wrote:

> I'm trying to extract all of the values from edm in the example below.
> However, the first attempt only retrieves the final number in the sequence
> since it is recorded using scientific notation.  The second attempt
> retrieves all of the numbers, but omits the scientific notation component
> of the final number.  How can I make the regular expression more general
> such that I get every value AND its corresponding "E"-value (i.e.,
> "...E-06"), where pertinent?   I've spent time reading through ?regex, but
> my attempts to use the "*" character, where the preceding item will be
> matched zero or more times, have so far proven syntactically incorrect or
> generally unsuccessful.  .Appreciate the help, Eric
> edm <-
> c("","param_value","6.301343","6.366305","6.431268","6.496230","6.561192","6.626155","9.091117E-06")
> param_values <- strapply(edm,"\\d+\\.\\d+E[-+]?\\d+", as.numeric,
> simplify=cbind)
> param_values
> #[1,] 9.091117e-06
> param_values <- strapply(edm,"\\d+\\.\\d+", as.numeric, simplify=cbind)
> param_values
> #[1,] 6.301343 6.366305 6.431268 6.49623 6.561192 6.626155 9.091117

If the individual elements of the vector are either numeric or non-numeric, why not just use:

> as.numeric(edm)
[1]           NA           NA 6.301343e+00 6.366305e+00 6.431268e+00
[6] 6.496230e+00 6.561192e+00 6.626155e+00 9.091117e-06
Warning message:
NAs introduced by coercion 

The non-numeric elements are returned as NA's, which you can remove by using ?na.omit.

The only reason to use a regex would be if the individual elements themselves contained both numeric and non-numeric characters. If you then want to explicitly format numeric output (which would yield a character vector), you can use ?sprintf or ?format. Keep in mind the difference between how R *PRINTS* a numeric value and how R *STORES* a numeric value internally.


Marc Schwartz

More information about the R-help mailing list