[R] using regular expressions to retrieve a digit-digit-dot structure from a string

Gabor Grothendieck ggrothendieck at gmail.com
Tue Jun 9 12:48:17 CEST 2009


On Tue, Jun 9, 2009 at 3:04 AM, Wacek
Kusnierczyk<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> Gabor Grothendieck wrote:
>> On Mon, Jun 8, 2009 at 7:18 PM, Wacek
>> Kusnierczyk<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>>
>>> Gabor Grothendieck wrote:
>>>
>>>> Try this.  See ?regex for more.
>>>>
>>>>
>>>>
>>>>> x <- 'This happened in the 21. century." (the dot behind 21 is'
>>>>> regexpr("(?![0-9]+)[.]", x, perl = TRUE)
>>>>>
>>>>>
>>>> [1] 24
>>>> attr(,"match.length")
>>>> [1] 1
>>>>
>>>>
>>> yes, but
>>>
>>>    gregexpr('(?![0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
>>>    # 2 5 9
>>>
>>
>> Yes, it should be:
>>
>>
>>> gregexpr('(?<=[0-9])[.]', 'a. 1. a1.', perl=TRU
E)
>>>
>> [[1]]
>> [1] 5 9
>> attr(,"match.length")
>> [1] 1 1
>>
>> which displays the position of every dot that is preceded
>> immediately by a digit.  Or just replace gregexpr with regexpr
>> if its intended that it match only one.
>>
>
> i guess what was needed was something like
>
>    gregexpr('(?<=\\b[0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
>    # 5
>
> which won't work, however, because pcre does not support variable-width
> lookbehinds.

No, what I wrote was what I intended.   I don't think we are
discussing the answer
at this point but just the interpretation of what was intended.  You
are including
the word boundary in the question and I am not.  I think its also possible that
regexpr is what is wanted, not gregexpr, but at this point I think the
poster has
enough answers that he can complete it himself by considering what he wants
and using one of ours or a suitable modification.




More information about the R-help mailing list