[R] regular expression question

Romain Francois romain.francois at dbmail.com
Tue Mar 3 10:18:27 CET 2009


Wacek Kusnierczyk wrote:
> markleeds at verizon.net wrote:
>   
>> can someone show me how to use a regular expression to break the
>> string at the bottom up into its three components :
>>
>> (-0.791,-0.263]
>> (-38,-1.24]
>> (0.96,2.43]
>>
>> I tried to use strplit because of my regexpitis ( it's not curable.
>> i've been to many doctors all over NYC. they tell me there's no cure 
>> )  but it doesn't work because there also dots inside  the brackets.
>> Thanks.
>>
>> (-0.791,-0.263].(-38,-1.24].(0.96,2.43]
>>
>>     
>
> here's one way to get a matrix of numeric values:
>    
>     text = "(-0.791,-0.263].(-38,-1.24].(0.96,2.43]"
>     values = matrix(ncol=2, byrow=TRUE,
>         as.numeric(
>            grep(pattern='.', value=TRUE,
>               x=strsplit(x=text, split=']\\.\\(|\\(|]|,')[[1]])))
>
> modify any of the steps according to your needs.
>
> vQ
>   
Here is another way with the gsubfn package:

 > require( gsubfn )
 > strapply( text, "\\(.*?,.*?]", perl = T )[[1]]
1] "(-0.791,-0.263]" "(-38,-1.24]"     "(0.96,2.43]"

Note that gregexpr would also help you here:

 > g <- gregexpr( "\\(.*?,.*?]", text, perl = T )[[1]]
 > g
[1]  1 17 29
attr(,"match.length")
[1] 15 11 11

But there is always the missing part of extracting the match from the 
result of (g)regexpr

 > substring( text, g, g + attr(g, "match.length" ) - 1 )
[1] "(-0.791,-0.263]" "(-38,-1.24]"     "(0.96,2.43]"

Romain

-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr




More information about the R-help mailing list