[R] vectorized sub, gsub, grep, etc.

john jjthaden at flash.net
Thu Oct 9 06:38:01 CEST 2008


Hello Christos,
  To my surprise, vectorization actually hurt processing speed!

#Example
X <- c("ab", "cd", "ef")
patt <- c("b", "cd", "a")
repl <- c("B", "CD", "A")

sub2 <- function(pattern, replacement, x) {
    len <- length(x)
    if (length(pattern) == 1) 
        pattern <- rep(pattern, len)
    if (length(replacement) == 1) 
        replacement <- rep(replacement, len)
    FUN <- function(i, ...) {
        sub(pattern[i], replacement[i], x[i], fixed = TRUE)
    }
    idx <- 1:length(x)
    sapply(idx, FUN)    
}
 
system.time(  for(i in 1:10000)  sub2(patt, repl, X)  )
   user  system elapsed 
   1.18    0.07    1.26 

system.time(  for(i in 1:10000)  mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X)  )
   user  system elapsed 
   1.42    0.05    1.47 
 
So much for avoiding loops.
John Thaden

======= At 2008-10-07, 14:58:10 Christos wrote: =======

>John,
>Try the following:
>
> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X)
>   b   cd    a 
>"aB" "CD" "ef"  
>
>-Christos

>> -----My Original Message-----
>> R pattern-matching and replacement functions are
>> vectorized: they can operate on vectors of targets.
>> However, they can only use one pattern and replacement.
>> Here is code to apply a different pattern and replacement for 
>> every target.  My question: can it be done better?
>> 
>> sub2 <- function(pattern, replacement, x) {
>>     len <- length(x)
>>     if (length(pattern) == 1) 
>>         pattern <- rep(pattern, len)
>>     if (length(replacement) == 1) 
>>         replacement <- rep(replacement, len)
>>     FUN <- function(i, ...) {
>>         sub(pattern[i], replacement[i], x[i], fixed = TRUE)
>>     }
>>     idx <- 1:length(x)
>>     sapply(idx, FUN)    
>> }
>> 
>> #Example
>> X <- c("ab", "cd", "ef")
>> patt <- c("b", "cd", "a")
>> repl <- c("B", "CD", "A")
>> sub2(patt, repl, X)
>> 
>> -John



More information about the R-help mailing list