[R] R newbie: how to replace string/regular expression

Gabor Grothendieck ggrothendieck at gmail.com
Sun Nov 2 15:52:14 CET 2008


There was an error in your regexp which I did not correct. Here it is
again corrected to better illustrate the solution:

> gsubfn("(.*)B", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE)
[1] "120.0M"    "11.01m"    "2.097e+09" "100.00k"   "50"

On Sun, Nov 2, 2008 at 7:55 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Your gsub example is almost exactly what gsubfn in the gsubfn package
> does.  gsubfn like gsub except the replacement string is a function:
>
>> library(gsubfn)
>> gsubfn("(.*)B$", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE)
> [1] "120.0M"    "11.01m"    "2.097e+09" "100.00k"   "50"
>
> Also there are examples very similare to this
>
> 1. at the end of section 2 of
> vignette("gsubfn")
>
> 2. in
> demo("gsubfn-si")
>
> Also see the gsubfn home page:
> http://gsubfn.googlecode.com
>
> Also note that if you want to return the values rather than
> transform and reinsert them then strapply in the same package
> can do that.
>
> On Sun, Nov 2, 2008 at 3:43 AM, Krishna Dagli/Krushna Dagli
> <krishna.dagli at gmail.com> wrote:
>> Hello;
>>
>> I am a R newbie and would like to know correct and efficient method for
>> doing string replacement.
>>
>> I have a large data set, where I want to replace character "M", "b",
>> and "K" (currency in Million, Billion and K) to  millions.  That is
>> 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100)
>> and etc..
>>
>> d <- c("120.0M", "11.01m", "209.7B", "100.00k", "50")
>>
>> This works that is it removes "b/B",
>>
>> gsub ("(.*)(B$)", "\\1", d, ignore.case=T, perl=T)
>>
>> but
>>
>> gsub ("(.*)(B$)", as.numeric("\\1") * 10e6, d, ignore.case=T, perl=T)
>>
>> does not work. I tried with sprintf and other combination of as.numeric but
>> that fails, how to use \\1 and multiply with 10e6??
>>
>> The other solution is :
>>
>> location <- grep ("M", d, ignore.case=T)
>> y <- sub("M", "", d, ignore.case=T)
>> y[location]<-y[location] * 10e6
>>
>> Is the second solution faster or (if) combination of grep along with
>> multiply (if it works) is faster? Or what is the most efficient method
>> to do something like this in R?
>>
>> Thanks and Regards
>> Krishna
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list