[R] Strplit code

John Fox jfox at mcmaster.ca
Fri Dec 5 12:11:03 CET 2008


Dear Wacek,

I've thought a bit more about this problem, and recall that I originally
wrote Strsplit() [and replacements for sub() and gsub(), which were not then
in S-PLUS] for the version of the car package that I released for S-PLUS,
because other functions in the package used these. The strings involved were
small, so performance issues weren't that important, although of course it's
better to have a more efficient solution.

Although I no longer have an installed copy of S-PLUS to confirm this, I
believe that gregexepr() is still not present in S-PLUS (though I think that
strsplit() is in the latest version). If that's the case, then your function
wouldn't work at all in the context of the original posting, which asked for
a solution in S-PLUS. You could make your code work in S-PLUS, and probably
still have it more efficient than mine, by writing a replacement for
gregexpr().

> -----Original Message-----
> From: Wacek Kusnierczyk [mailto:Waclaw.Marcin.Kusnierczyk at idi.ntnu.no]
> Sent: December-04-08 7:29 AM
> To: John Fox
> Cc: R help
> Subject: Re: [R] Strplit code
> 
> John Fox wrote:
> > Dear Wacek,
> >
> > "Wrong" is a bit strong, I think -- limited to single-pattern characters
is
> > more accurate.
> 
> nothing is ever wrong if seen from an appropriate perspective.  for
> example, there is nothing wrong in that many core functions in r deparse
> some, but not all, of the argument expressions, without any obvious
> pattern -- when you get used to it and learn each single case by heart,
> it's perfectly correct.
> 
> 
> > Moreover, it isn't hard to make the function work with
> > multiple-character matches as well:
> >
> 
> which you probably should have done before posting the flawed version.

Indeed. Had I anticipated the possibility of multiple-character splits I
would have done so.

John

> 
> > Strsplit <- function(x, split){
> >     if (length(x) > 1) {
> >         return(lapply(x, Strsplit, split))  # vectorization
> >         }
> >     result <- character(0)
> >     if (nchar(x) == 0) return(result)
> >     posn <- regexpr(split, x)
> >     if (posn <= 0) return(x)
> >     c(result, substring(x, 1, posn - 1),
> >         Recall(substring(x, posn + attr(posn, "match.length"),
> >           nchar(x)), split))  # recursion
> >     }
> >
> > On the other hand, your function is much more efficient.
> >
> 
> just one order of magnitude in my tests.  might not be completely fool
> proof, though.
> 
> vQ



More information about the R-help mailing list