[R] string split problem

Marc Schwartz marc_schwartz at me.com
Fri Oct 23 21:39:50 CEST 2015


> On Oct 23, 2015, at 2:17 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:
> 
> Dear list,
> 
> Say I have a vector that has two different types of string
> 
> test <- c('aaa.bb.cc','aaa.dd')
> 
> I want to extract the first part of the string (aaa) as a name and save the
> rest of the string as another name.
> 
> I was thinking something like
> 
> sub('(.*)\\.(.*)','\\1',test) but doesn't give me what I want.
> 
> 
> Appreciate any comments. Thanks.
> 
> Jun


How about something like this, which presumes that the characters (besides the periods) are only letters:

> gsub("^([[:alpha:]]+)\\.(.*)$", "\\1|\\2", test) 
[1] "aaa|bb.cc" "aaa|dd"   

or

> sub("^([[:alpha:]]+)\\.(.*)$", "\\1|\\2", test) 
[1] "aaa|bb.cc" "aaa|dd"   


The above takes the two components, before and after the first '.', adds the "|" as a character in between, to then be used in strsplit():


> strsplit(gsub("^([[:alpha:]]+)\\.(.*)$", "\\1|\\2", test), split = "\\|") 
[[1]]
[1] "aaa"   "bb.cc"

[[2]]
[1] "aaa" "dd" 


See ?regex

Regards,

Marc Schwartz



More information about the R-help mailing list