[R] using regular expressions to retrieve a digit-digit-dot structure from a string

Tue Jun 9 18:36:24 CEST 2009

You can sometimes fake variable width look behinds with Perl regexs using '\K':

> gregexpr('\\b[0-9]+\\K[.]', 'a. 1. a1. 11.', perl=TRUE)
[[1]]
[1]  5 13
attr(,"match.length")
[1] 1 1

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Wacek Kusnierczyk
> Sent: Tuesday, June 09, 2009 1:05 AM
> To: Gabor Grothendieck
> Cc: r-help at r-project.org; Mark Heckmann
> Subject: Re: [R] using regular expressions to retrieve a digit-digit-
> dot structure from a string
> 
> Gabor Grothendieck wrote:
> > On Mon, Jun 8, 2009 at 7:18 PM, Wacek
> > Kusnierczyk<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> >
> >> Gabor Grothendieck wrote:
> >>
> >>> Try this.  See ?regex for more.
> >>>
> >>>
> >>>
> >>>> x <- 'This happened in the 21. century." (the dot behind 21 is'
> >>>> regexpr("(?![0-9]+)[.]", x, perl = TRUE)
> >>>>
> >>>>
> >>> [1] 24
> >>> attr(,"match.length")
> >>> [1] 1
> >>>
> >>>
> >> yes, but
> >>
> >>    gregexpr('(?![0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
> >>    # 2 5 9
> >>
> >
> > Yes, it should be:
> >
> >
> >> gregexpr('(?<=[0-9])[.]', 'a. 1. a1.', perl=TRUE)
> >>
> > [[1]]
> > [1] 5 9
> > attr(,"match.length")
> > [1] 1 1
> >
> > which displays the position of every dot that is preceded
> > immediately by a digit.  Or just replace gregexpr with regexpr
> > if its intended that it match only one.
> >
> 
> i guess what was needed was something like
> 
>     gregexpr('(?<=\\b[0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
>     # 5
> 
> which won't work, however, because pcre does not support variable-width
> lookbehinds.
> 
> >
> >> which, i guess, is not what you want.  if what you want is to match
> all
> >> and only dots that follow at least one digit preceded by a word
> >> boundary, then the following should do, as far as i can see:
> >>
> >>    gregexpr('\\b[0-9]+\\K[.]', 'a. 1. a1.', perl=TRUE)
> >>    # 5
> >>
> >> vQ
> >>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.