[R] sub question

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Sun Feb 1 22:14:33 CET 2009


Wacek Kusnierczyk wrote:
> Gabor Grothendieck wrote:
>   
>> On Sat, Jan 31, 2009 at 4:46 PM, Wacek Kusnierczyk
>> <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>>   
>>     
>>>
>>> to extend the context, if you were to solve the problem in perl, the
>>> regex below would work in perl 5.10, but not in earlier versions of
>>> perl;  another approach is to replace the unwanted leading characters
>>> with equally many replacement characters at once.
>>>
>>> $string = 'aabaab';
>>>
>>> # perl 5.10
>>> $string =~ s/a|(*COMMIT)(*FAIL)/c/g
>>> # $string is 'ccbaab'
>>>
>>> # any recent perl
>>> $string =~ s/^a*/'c' x length $&/e;
>>> # $string is 'ccbaab'
>>>
>>> i don't know how (if) the latter could be done in r.
>>>     
>>>       
>> This seems quite analogous:
>>
>> library(gsubfn)
>> s <- "aabaab"
>> gsubfn("^a*", ~ paste(rep("c", nchar(x)), collapse = ""), s)[[1]]
>>   
>>     
>
> indeed, as does the following variant:
>
> gsubfn("^a*", ~ gsub(".", "c", x), s)[[1]]
>
>   

just for the record, the two gsubfn-based versions run substantially
slower than the gsub-based one;  with 1000 strings of 100 random letters
each, the difference is 2 orders of magnitude (see the attached naive
test).  i guess much of it is due to r-based implementation of gsubfn,
and when you have it in c the difference will reduce dramatically.

vQ
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: prefix.r
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090201/6c084165/attachment-0001.pl>


More information about the R-help mailing list