[R] parsing numeric values

baptiste auguie baptiste.auguie at googlemail.com
Wed Nov 18 19:54:28 CET 2009


Hi,

Thanks for the alternative approach. However, I should have made my
example more complete in that other lines may also have numeric
values, which I'm not interested in. Below is an updated problem, with
my current solution,

tc <- textConnection(
"some text
 <ax> =    1.3770E-03     <bx> =    3.4644E-07
 <ay> =    1.9412E-04     <by> =    4.8840E-08

other text
 <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
 <aay>  =    1.9412E-04     <bby> =    4.8840E-08

lots of other material,  including numeric values
 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
etc...")

input <-
readLines(tc)
close(tc)

## I want to retrieve the values for
## <ax>, <ay>, <aax> and <aay> only

results <- c(
strapply(input, "<ax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, "<ay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, "<aax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, "<aay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind))

results

Using the suggested base R solution, I've come up with this variation,

z <- gsub("[^[:digit:]E.+-]"," ", grep("<ax>|<ay>|<aax>|<aay>", input,
value=TRUE))

test <- scan(textConnection(z),what=0)
test[seq(1, length(test), by=2)]


Thanks again,

baptiste

2009/11/18 Bert Gunter <gunter.berton at gene.com>:
> The previous elegant solutions required the use of the gsubfn package.
> Nothing wrong with that, of course, but I'm always curious whether still
> relatively simple base R solutions can be found, as they are often (but not
> always!) much faster. And anyway, it seems to be in the spirit of your query
> to try such a solution. So here is one base R approach that I believe works.
> I'll break it up into 2 lines so you can see what's going on.
>
> ## Using your example...
> ## First replace everything but the number with spaces
>
>> z <- gsub("[^[:digit:]E.+-]"," ",input)
>> z
> [1] "         "
> [2] "            1.3770E-03               3.4644E-07"
> [3] "            1.9412E-04               4.8840E-08"
> [4] ""
> [5] "          "
> [6] "              1.3770E-03                3.4644E-07"
> [7] "              1.9412E-04                4.8840E-08"
>
> ## Now it can be scanned to a numeric via
>
>> z<-scan(textConnection(z),what=0)
> Read 8 items
>> z
> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
> 1.9412e-04 4.8840e-08
>
> ########
> I believe this strategy is reasonably general, but I haven't checked it
> carefully and would appreciate folks pointing out where it trips up (e.g.
> perhaps with NA's).
>
> Best,
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>  -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of baptiste auguie
> Sent: Wednesday, November 18, 2009 3:57 AM
> To: r-help
> Subject: [R] parsing numeric values
>
> Dear list,
>
> I'm seeking advice to extract some numeric values from a log file
> created by an external program. Consider the following example,
>
> input <-
> readLines(textConnection(
> "some text
>  <ax> =    1.3770E-03     <bx> =    3.4644E-07
>  <ay> =    1.9412E-04     <by> =    4.8840E-08
>
> other text
>  <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
>  <aay>  =    1.9412E-04     <bby> =    4.8840E-08"))
>
> ## this is what I want
> results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]),
>             as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]),
>             as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]),
>             as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9])
>             )
>
> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>
> The use of strsplit is not ideal here as there is a different number
> of space characters in the lines containing <ax> and <aax> for
> instance (hence the indices 8 and 9 respectively).
>
> I tried to use gsubfn for a cleaner construct,
>
> strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>
> but I can't seem to find the correct regular expression to deal with
> the exponent.
>
>
> Any tips are welcome!
>
>
> Best regards,
>
> baptiste
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list