[R] String manipulation

Gabor Grothendieck ggrothendieck at gmail.com
Sun Feb 13 18:40:13 CET 2011


On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700004 at gmail.com> wrote:
> Please consider following string:
>
> MyString <- "ABCFR34564IJVEOJC3434"
>
> Here you see that, there are 4 groups in above string. 1st and 3rd groups
> are for english letters and 2nd and 4th for numeric. Given a string, how can
> I separate out those 4 groups?
>

Try this.  "\\D+" and "\\d+" match non-digits and digits respectively.
 The portions within parentheses are captures and passed to the c
function.  It returns a list with a component for each element of
MyString.  Like R's split it returns a list with a component per
element of MyString but MyString only has one element so we get its
contents using  [[1]].

> library(gsubfn)
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", c)[[1]]
[1] "ABCFR"   "34564"   "IJVEOJC" "3434"

Alternately we could convert the relevant portions to numbers at the
same time.  ~ list(...) is interpreted as a  function whose body is
the right hand side of the ~ and whose arguments are the free
variables, i.e. s1, s2, s3 and s4.

strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", ~ list(s1,
as.numeric(s2), s3, as.numeric(s4)))[[1]]

See http://gsubfn.googlecode.com for more.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list