[R] splitting a string into words preserving blanks (using regex)

Gabor Grothendieck ggrothendieck at gmail.com
Mon Oct 24 16:07:02 CEST 2011


On Mon, Oct 24, 2011 at 9:46 AM, Mark Heckmann <mark.heckmann at gmx.de> wrote:
> I would like to split a string into words at its blanks but also to preserve all blanks.
>
> Example:
>        c(" some    words to split ")
> should become
>        c(" ", "some", "   ", " words", " ", "to" , " ", "split", " ")
>
> I was not able to achieve this via strsplit() .
> But I am not familiar with regular expressions.
> Is there an easy way to do that using e.g. regex and strsplit?

Try this:

> library(gsubfn)
> x <- " some    words to split "
> v <- strapply(x, "(\\s*)(\\S+)(\\s*)", c)[[1]]
> v[nchar(v) > 0]
[1] " "     "some"  "    "  "words" " "     "to"    " "     "split" " "


If you don't need the trailing space it can be further simplified:

> strapply(xx, "(\\s*)(\\S+)", c)[[1]]
[1] " "     "some"  "    "  "words" " "     "to"    " "     "split"

or if you don't need the leading space it can be simplified like this:

> strapply(xx, "(\\S+)(\\s*)", c)[[1]]
[1] "some"  "    "  "words" " "     "to"    " "     "split" " "

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list