[R] using regular expressions to retrieve a digit-digit-dot structure from a string

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jun 9 13:32:08 CEST 2009


Gabor Grothendieck wrote:
> On Tue, Jun 9, 2009 at 3:04 AM, Wacek
> Kusnierczyk<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>   
>> Gabor Grothendieck wrote:
>>     
>>> On Mon, Jun 8, 2009 at 7:18 PM, Wacek
>>> Kusnierczyk<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>>>
>>>       
>>>> Gabor Grothendieck wrote:
>>>>
>>>>         
>>>>> Try this.  See ?regex for more.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> x <- 'This happened in the 21. century." (the dot behind 21 is'
>>>>>> regexpr("(?![0-9]+)[.]", x, perl = TRUE)
>>>>>>
>>>>>>
>>>>>>             
>>>>> [1] 24
>>>>> attr(,"match.length")
>>>>> [1] 1
>>>>>
>>>>>
>>>>>           
>>>> yes, but
>>>>
>>>>    gregexpr('(?![0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
>>>>    # 2 5 9
>>>>
>>>>         
>>> Yes, it should be:
>>>
>>>
>>>       
>>>> gregexpr('(?<=[0-9])[.]', 'a. 1. a1.', perl=TRU
>>>>         
> E)
>   
>>> [[1]]
>>> [1] 5 9
>>> attr(,"match.length")
>>> [1] 1 1
>>>
>>> which displays the position of every dot that is preceded
>>> immediately by a digit.  Or just replace gregexpr with regexpr
>>> if its intended that it match only one.
>>>
>>>       
>> i guess what was needed was something like
>>
>>    gregexpr('(?<=\\b[0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
>>    # 5
>>
>> which won't work, however, because pcre does not support variable-width
>> lookbehinds.
>>     
>
> No, what I wrote was what I intended.   I don't think we are
> discussing the answer
> at this point but just the interpretation of what was intended.  


which amounts to discussing whether the answer is appropriate ;)

> You
> are including
> the word boundary in the question and I am not.  

indeed, and i think this was essential.  but irrespectively of whether
it really was or not, this sort of problem shows the insufficiency of a
lookbehind, and illustrates the use of the \K operator, so it will
hopefully be easier for the op and others to design the right pattern in
similar future cases.

vQ




More information about the R-help mailing list