[R] regular expression question

Berwin A Turlach berwin at maths.uwa.edu.au
Tue Mar 3 08:41:56 CET 2009


G'day Mark,

On Tue, 03 Mar 2009 00:16:34 -0600 (CST)
markleeds at verizon.net wrote:

> can someone show me how to use a regular expression to break the
> string at the bottom up into its three components :
> 
> (-0.791,-0.263]
> (-38,-1.24]
> (0.96,2.43]
> 
> I tried to use strplit because of my regexpitis ( it's not curable.
> i've been to many doctors all over NYC. they tell me there's no
> cure  )  but it doesn't work because there also dots inside  the
> brackets. Thanks.
> 
> (-0.791,-0.263].(-38,-1.24].(0.96,2.43]

Probably you will get better answers from regexp experts, but here we
go:

The problem seems to be that strsplit() throws away the part that is
matched when deciding where to split.  Thus, I guess the aim would be
to replace the `.' on which you want to split by something else and
then use strsplit().  For example you could do:

R> str <- "(-0.791,-0.263].(-38,-1.24].(0.96,2.43]"
R> (uu <- gsub("(\\([^]]*\\])(\\.)", "\\1RIsGreat", str))
[1] "(-0.791,-0.263]RIsGreat(-38,-1.24]RIsGreat(0.96,2.43]"
R> strsplit(uu, "RIsGreat")
[[1]]
[1] "(-0.791,-0.263]" "(-38,-1.24]"     "(0.96,2.43]"    

Though the following works too.

R> (uu <- gsub("(\\([^]]*\\])(\\.)", "\\1?", str))
[1] "(-0.791,-0.263]?(-38,-1.24]?(0.96,2.43]"
R> strsplit(uu, "\\?")
[[1]]
[1] "(-0.791,-0.263]" "(-38,-1.24]"     "(0.96,2.43]"    

To explain the gsub() command, it says look for an opening round
bracket ("\\("), followed by anything but a square close bracket
("[^]]"), followed by a close square bracket ("\\]") which if followed
by a dot ("\\.").  Call the part that is made up from the first three
parts group 1 and the dot group too (that's the open/close brackets in
the regexp:
(\\([^]]\\\)(\\.)
^^^^^^^^^^^^-----
group1      group2

Hopefully that explains the regexp used in the first part, the second
part then says replace this pattern by repeating the first group
("\\1") and by replacing the second group with "RIsGreat" or,
respectively "?".

HTH.

Cheers,

	Berwin

=========================== Full address =============================
Berwin A Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    http://www.stat.nus.edu.sg/~statba




More information about the R-help mailing list