[R] Strplit code

Gabor Grothendieck ggrothendieck at gmail.com
Thu Dec 4 13:55:24 CET 2008


R does not support tail recursion so not using it would
not matter.

On Thu, Dec 4, 2008 at 5:04 AM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> John Fox wrote:
>> By coincidence, I have a version of strsplit() that I've used to
>> illustrate recursion:
>>
>> Strsplit <- function(x, split){
>>     if (length(x) > 1) {
>>         return(lapply(x, Strsplit, split))  # vectorization
>>         }
>>     result <- character(0)
>>     if (nchar(x) == 0) return(result)
>>     posn <- regexpr(split, x)
>>     if (posn <= 0) return(x)
>>     c(result, substring(x, 1, posn - 1),
>>         Recall(substring(x, posn+1, nchar(x)), split))  # recursion
>>     }
>>
>>
>
> well, it is both inefficient and wrong.
>
> inefficient because of the non-tail recursion and recursive
> concatenation, which is justified for the sake the purpose of showing
> recursion, but for practical purposes you'd rather use gregexepr.
>
> wrong because of how you pick the remaining part of the string to be
> split -- it works just under the assumption the pattern is a single
> character:
>
> Strsplit("hello-dolly,--sweet", "--")
> # the pattern is *two* hyphens
> # [1] "hello-dolly" "-sweet"
>
> Strsplit("hello dolly", "")
> # the pattern is the empty string
> #  [1] "" "" "" "" "" "" "" "" "" "" ""
>
>
> here's a quick rewrite -- i haven't tested it on extreme cases, it may
> not be perfect, and there's a hidden source of inefficiency here as well:
>
> strsplit =
> function(strings, split) {
>    positions = gregexpr(split, strings)
>    lapply(1:length(strings), function(i)
>        substring(strings[[i]], c(1, positions[[i]] +
> attr(positions[[i]], "match.length")), c(positions[[i]]-1,
> nchar(strings[[i]]))))
> }
>
>
> n = 1000; m = 100
> strings = replicate(n, paste(sample(c(letters, " "), 100, replace=TRUE),
> collapse=""))
> system.time(replicate(m, strsplit(strings, " ")))
> system.time(replicate(m, Strsplit(strings, " ")))
>
>
> vQ
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list