[R] regex question

markleeds at verizon.net markleeds at verizon.net
Tue Nov 4 06:19:24 CET 2008


Hi: Gabor's solution does do it in a single line. he just used paste to 
make the line. see below. John's is sort of a single line also but he 
called sub twice.
I doubt that it's possible to make  it shorter than those solutions.

# Gabor's solution spelled  out.

patReg1 <- "(^[ <*]+)"
patReg2 <- "([ > ]+$)"
temp <- paste(patReg1, patReg2, sep = "|")
print(temp)

gsub(temp, "", varReg)



On Tue, Nov 4, 2008 at 12:10 AM, Ferry wrote:

> Dear John, Gabor ...
>
> Thank you for your fast responses.
> In term of efficiency, does my code efficient? I mean, I thought there
> is a way to combine both patterns into a single line.
>
> Also, I tried to substitute the pattern ([ <*]+) with ([[:punct:]]),
> as in R regex docs:
> patReg1 <- "(^[[:punct:]]+)"
>
> but it doesn't work.
>
> or, possibly it just my stupidity ?
>
> On Mon, Nov 3, 2008 at 5:59 PM, John Fox <jfox at mcmaster.ca> wrote:
>> Dear Ferry,
>>
>> You're almost all the way there. Just apply each substitution in 
>> turn:
>>
>> varReg <- "*  < <* this is my text > > "
>> left <- "(^[ <*]+)"
>> right <- "([ > ]+$)"
>> sub(right, "", sub(left, "", varReg))
>> [1] "this is my text"
>>
>> I hope this helps,
>>  John
>>
>> ------------------------------
>> John Fox, Professor
>> Department of Sociology
>> McMaster University
>> Hamilton, Ontario, Canada
>> web: socserv.mcmaster.ca/jfox
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org 
>>> [mailto:r-help-bounces at r-project.org]
>> On
>>> Behalf Of Ferry
>>> Sent: November-03-08 8:38 PM
>>> To: r-help at r-project.org
>>> Subject: [R] regex question
>>>
>>> hello,
>>>
>>> i am trying to extract text using regex as follows:
>>>
>>> "*  < <* this is my text > > "
>>>
>>> into:
>>>
>>> "this is my text"
>>>
>>> below what I did:
>>>
>>> varReg <- "*  < <* this is my text > > "
>>>
>>> ## either this pattern
>>> patReg <- "(^[ <*]+)"
>>> ## or below patten
>>> patReg <- "([ > ]+$)"
>>>
>>> sub(patReg, '', varReg)
>>>
>>> depending of which patten I use, I could only extra the first 
>>> portion
>>> or the last portion of the unwanted characters. how to extract both
>>> ends and keep my text "this is my text" ?
>>>
>>> I have tried with gsub, as below:
>>> patReg <- "([ >* ]+)"
>>> gsub(patReg, '', varReg)
>>>
>>> but it returned "thisismytext"
>>>
>>> any idea is appreciated.
>>>
>>> thanks,
>>>
>>> ferry
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list