[R] Split a string vector with '[ ]'

arun smartpink111 at yahoo.com
Mon Jun 9 13:05:55 CEST 2014


Hi Alexsandro,
Suppose if you have strings

nw.str1 <- "[D][A|D]A:F[T|A:D]N[C|T]"
nw.str2 <- "[D][A|D]A[T|A:D][C|T]NA{DG]P"

you could use:
library(qdap)

as.vector(bracketXtract(nw.str1,"square",T)) 
#[1] "[D]"     "[A|D]"   "[T|A:D]" "[C|T]"  
 as.vector(bracketXtract(nw.str2,"square",T)) 
#[1] "[D]"     "[A|D]"   "[T|A:D]" "[C|T]"  


#or
regmatches(nw.str1, gregexpr("(\\[).*?(\\])", nw.str1))[[1]]
#[1] "[D]"     "[A|D]"   "[T|A:D]" "[C|T]"  
regmatches(nw.str2, gregexpr("(\\[).*?(\\])", nw.str2))[[1]]
#[1] "[D]"     "[A|D]"   "[T|A:D]" "[C|T]"  

#or modifying David's and Duncan's codes for the first case:
scan(what="",text=gsub("\\].*?\\[","] [", nw.str1))
#Read 4 items
#[1] "[D]"     "[A|D]"   "[T|A:D]" "[C|T]"  

readLines(textConnection(gsub("\\].*?\\[", "]\n[", nw.str1))) 
#[1] "[D]"     "[A|D]"   "[T|A:D]" "[C|T]"  

##I couldn't get it right with ?gsub() for the second case.

A.K.



On Sunday, June 8, 2014 4:57 PM, David Winsemius <dwinsemius at comcast.net> wrote:

On Jun 8, 2014, at 1:46 PM, Duncan Murdoch wrote:

> On 08/06/2014, 4:30 PM, Alexsandro Cândido de Oliveira Silva wrote:
>> Hi,
>> 
>> I have a string something like that:
>> 
>> nw.str <- "[D][A|D][T|A:D][C|T]"
>> 
>> And I need to split it in this way:
>> 
>> "[D]" "[A|D]" "[T|A:D]" "[C|T]"
> 
> You could probably use lookahead and lookbehind Perl regular
> expressions, but this might be easier:
> 
> readLines(textConnection(gsub("\\]\\[", "]\n[", nw.str)))
> 
> This just inserts a newline between each pair of brackets, and then
> reads the resulting string.

Same idea with scan() using space as separator:

scan(what="", text=gsub("\\]\\[", "\\] \\[", nw.str))

-- 
David Winsemius
Alameda, CA, USA




______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list