[R] strsplit, keeping delimiters

Gabor Grothendieck ggrothendieck at gmail.com
Sat Jun 14 19:06:14 CEST 2008


On Sat, Jun 14, 2008 at 11:46 AM, hadley wickham <h.wickham at gmail.com> wrote:
> On Sat, Jun 14, 2008 at 10:20 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> "hadley wickham" <h.wickham at gmail.com> writes:
>> n
>>> On Sat, Jun 14, 2008 at 12:55 AM, Gabor Grothendieck
>>> <ggrothendieck at gmail.com> wrote:
>>>> Try this:
>>>>
>>>>> library(gsubfn)
>>>>> x <- "A: 123 B: 456 C: 678"
>>>>> strapply(x, "[^ :]+[ :]|[^ :]+$")
>>>> [[1]]
>>>> [1] "A:"   "123 " "B:"   "456 " "C:"   "678"
>>
>
> Either way is fine, since I'll be stripping off the spaces later anyway.
>

Note that if you intend to strip off the delimiters anyways but still
want them to examine them you might want to make  use of the
other arguments of strapply too:

> x <- "AC: 123 BDEF: 456 CADSDFSDFSF: 6sdf:78"

> strapply(x, "([^ :]+)([ :]|$)", ~ c(...), b= -2)
[[1]]
 [1] "AC"          ":"           "123"         " "           "BDEF"
 [6] ":"           "456"         " "           "CADSDFSDFSF" ":"
[11] "6sdf"        ":"           "78"          ""

That returns the match followed by the delimiter as separate
strings which can be reshaped into an n x 2 matrix.

Or, all in one strapply:

> strapply(x, "([^ :]+)([ :]|$)", FUN = ~ c(...), b= -2, simplify = ~ matrix(x, nc = 2, byrow = TRUE))
     [,1]          [,2]
[1,] "AC"          ":"
[2,] "123"         " "
[3,] "BDEF"        ":"
[4,] "456"         " "
[5,] "CADSDFSDFSF" ":"
[6,] "6sdf"        ":"
[7,] "78"          ""

Here b is short for backref and b = -2 says pass only the 2 back
references (minus means only) to FUN.  It then applies the function
whose body is given by the formula, FUN, and simplifies
the result using the function whose body is given by the formula,
simlify.  It uses the free variables in the two formulae (... in the
first case and x in the second case) to construct the formal
arguments of these functions.



More information about the R-help mailing list