[R] R newbie: how to replace string/regular expression

Gabor Grothendieck ggrothendieck at gmail.com
Sun Nov 2 23:06:45 CET 2008


I did provide a link to that solution already but also wanted to
show how to do it in the same way that the code in the question
was written.

On Sun, Nov 2, 2008 at 4:56 PM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
>
>
>
> Gabor,
>
> Why not just this:
>
>        expos <- list( B="e9", M="e6", m="e6", k="e3" )
>        as.numeric( gsubfn("[[:alpha:]]", expos, d ) )
>
> HTH,
>
> Chuck
>
> p.s. I am not sure why B goes with e6 or K with e-02 (below), but Krishna
> can adjust the values accordingly.
>
>
> On Sun, 2 Nov 2008, Gabor Grothendieck wrote:
>
>> There was an error in your regexp which I did not correct. Here it is
>> again corrected to better illustrate the solution:
>>
>>> gsubfn("(.*)B", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE)
>>
>> [1] "120.0M"    "11.01m"    "2.097e+09" "100.00k"   "50"
>>
>> On Sun, Nov 2, 2008 at 7:55 AM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>>
>>> Your gsub example is almost exactly what gsubfn in the gsubfn package
>>> does.  gsubfn like gsub except the replacement string is a function:
>>>
>>>> library(gsubfn)
>>>> gsubfn("(.*)B$", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE)
>>>
>>> [1] "120.0M"    "11.01m"    "2.097e+09" "100.00k"   "50"
>>>
>>> Also there are examples very similare to this
>>>
>>> 1. at the end of section 2 of
>>> vignette("gsubfn")
>>>
>>> 2. in
>>> demo("gsubfn-si")
>>>
>>> Also see the gsubfn home page:
>>> http://gsubfn.googlecode.com
>>>
>>> Also note that if you want to return the values rather than
>>> transform and reinsert them then strapply in the same package
>>> can do that.
>>>
>>> On Sun, Nov 2, 2008 at 3:43 AM, Krishna Dagli/Krushna Dagli
>>> <krishna.dagli at gmail.com> wrote:
>>>>
>>>> Hello;
>>>>
>>>> I am a R newbie and would like to know correct and efficient method for
>>>> doing string replacement.
>>>>
>>>> I have a large data set, where I want to replace character "M", "b",
>>>> and "K" (currency in Million, Billion and K) to  millions.  That is
>>>> 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100)
>>>> and etc..
>>>>
>>>> d <- c("120.0M", "11.01m", "209.7B", "100.00k", "50")
>>>>
>>>> This works that is it removes "b/B",
>>>>
>>>> gsub ("(.*)(B$)", "\\1", d, ignore.case=T, perl=T)
>>>>
>>>> but
>>>>
>>>> gsub ("(.*)(B$)", as.numeric("\\1") * 10e6, d, ignore.case=T, perl=T)
>>>>
>>>> does not work. I tried with sprintf and other combination of as.numeric
>>>> but
>>>> that fails, how to use \\1 and multiply with 10e6??
>>>>
>>>> The other solution is :
>>>>
>>>> location <- grep ("M", d, ignore.case=T)
>>>> y <- sub("M", "", d, ignore.case=T)
>>>> y[location]<-y[location] * 10e6
>>>>
>>>> Is the second solution faster or (if) combination of grep along with
>>>> multiply (if it works) is faster? Or what is the most efficient method
>>>> to do something like this in R?
>>>>
>>>> Thanks and Regards
>>>> Krishna
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry                            (858) 534-2098
>                                            Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu               UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>
>



More information about the R-help mailing list