[R] parsing numeric values

baptiste auguie baptiste.auguie at googlemail.com
Wed Nov 18 21:21:14 CET 2009


another useful trick that could come in handy, thanks!

baptiste

2009/11/18 Gabor Grothendieck <ggrothendieck at gmail.com>:
> Here is a slight variation:
>
>> read.table(textConnection(grep("<aa?[xy]>", input, value = TRUE)),
> +    colClasses = c("NULL", "NULL", "numeric"))
>          V3         V6
> 1 0.00137700 3.4644e-07
> 2 0.00019412 4.8840e-08
> 3 0.00137700 3.4644e-07
> 4 0.00019412 4.8840e-08
>
>
>
> On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie
> <baptiste.auguie at googlemail.com> wrote:
>> Hi,
>>
>> Thanks for the alternative approach. However, I should have made my
>> example more complete in that other lines may also have numeric
>> values, which I'm not interested in. Below is an updated problem, with
>> my current solution,
>>
>> tc <- textConnection(
>> "some text
>>  <ax> =    1.3770E-03     <bx> =    3.4644E-07
>>  <ay> =    1.9412E-04     <by> =    4.8840E-08
>>
>> other text
>>  <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
>>  <aay>  =    1.9412E-04     <bby> =    4.8840E-08
>>
>> lots of other material,  including numeric values
>>  1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
>>  12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
>> etc...")
>>
>> input <-
>> readLines(tc)
>> close(tc)
>>
>> ## I want to retrieve the values for
>> ## <ax>, <ay>, <aax> and <aay> only
>>
>> results <- c(
>> strapply(input, "<ax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind),
>> strapply(input, "<ay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind),
>> strapply(input, "<aax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind),
>> strapply(input, "<aay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind))
>>
>> results
>>
>> Using the suggested base R solution, I've come up with this variation,
>>
>> z <- `, grep("<ax>|<ay>|<aax>|<aay>", input,
>> value=TRUE))
>>
>> test <- scan(textConnection(z),what=0)
>> test[seq(1, length(test), by=2)]
>>
>>
>> Thanks again,
>>
>> baptiste
>>
>> 2009/11/18 Bert Gunter <gunter.berton at gene.com>:
>>> The previous elegant solutions required the use of the gsubfn package.
>>> Nothing wrong with that, of course, but I'm always curious whether still
>>> relatively simple base R solutions can be found, as they are often (but not
>>> always!) much faster. And anyway, it seems to be in the spirit of your query
>>> to try such a solution. So here is one base R approach that I believe works.
>>> I'll break it up into 2 lines so you can see what's going on.
>>>
>>> ## Using your example...
>>> ## First replace everything but the number with spaces
>>>
>>>> z <- gsub("[^[:digit:]E.+-]"," ",input)
>>>> z
>>> [1] "         "
>>> [2] "            1.3770E-03               3.4644E-07"
>>> [3] "            1.9412E-04               4.8840E-08"
>>> [4] ""
>>> [5] "          "
>>> [6] "              1.3770E-03                3.4644E-07"
>>> [7] "              1.9412E-04                4.8840E-08"
>>>
>>> ## Now it can be scanned to a numeric via
>>>
>>>> z<-scan(textConnection(z),what=0)
>>> Read 8 items
>>>> z
>>> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
>>> 1.9412e-04 4.8840e-08
>>>
>>> ########
>>> I believe this strategy is reasonably general, but I haven't checked it
>>> carefully and would appreciate folks pointing out where it trips up (e.g.
>>> perhaps with NA's).
>>>
>>> Best,
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>>  -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>>> Behalf Of baptiste auguie
>>> Sent: Wednesday, November 18, 2009 3:57 AM
>>> To: r-help
>>> Subject: [R] parsing numeric values
>>>
>>> Dear list,
>>>
>>> I'm seeking advice to extract some numeric values from a log file
>>> created by an external program. Consider the following example,
>>>
>>> input <-
>>> readLines(textConnection(
>>> "some text
>>>  <ax> =    1.3770E-03     <bx> =    3.4644E-07
>>>  <ay> =    1.9412E-04     <by> =    4.8840E-08
>>>
>>> other text
>>>  <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
>>>  <aay>  =    1.9412E-04     <bby> =    4.8840E-08"))
>>>
>>> ## this is what I want
>>> results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]),
>>>             as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]),
>>>             as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]),
>>>             as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9])
>>>             )
>>>
>>> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>>>
>>> The use of strsplit is not ideal here as there is a different number
>>> of space characters in the lines containing <ax> and <aax> for
>>> instance (hence the indices 8 and 9 respectively).
>>>
>>> I tried to use gsubfn for a cleaner construct,
>>>
>>> strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>>>
>>> but I can't seem to find the correct regular expression to deal with
>>> the exponent.
>>>
>>>
>>> Any tips are welcome!
>>>
>>>
>>> Best regards,
>>>
>>> baptiste
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>




More information about the R-help mailing list