[R] Minimal match to regexp?

Andrew Simmons @kw@|mmo @end|ng |rom gm@||@com
Thu Jan 26 01:34:45 CET 2023


grep(value = TRUE) just returns the strings which match the pattern. You
have to use regexpr() or gregexpr() if you want to know where the matches
are:

```
x <- "abaca"

# extract only the first match with regexpr()
m <- regexpr("a.*?a", x)
regmatches(x, m)

# or

# extract every match with gregexpr()
m <- gregexpr("a.*?a", x)
regmatches(x, m)
```

You could also use sub() to remove the rest of the string:
`sub("^.*(a.*?a).*$", "\\1", x)`
keeping only the match within the parenthesis.


On Wed, Jan 25, 2023, 19:19 Duncan Murdoch <murdoch.duncan using gmail.com> wrote:

> The docs for ?regexp say this:  "By default repetition is greedy, so the
> maximal possible number of repeats is used. This can be changed to
> ‘minimal’ by appending ? to the quantifier. (There are further
> quantifiers that allow approximate matching: see the TRE documentation.)"
>
> I want the minimal match, but I don't seem to be getting it.  For example,
>
> x <- "abaca"
> grep("a.*?a", x, value = TRUE)
> #> [1] "abaca"
>
> Shouldn't I have gotten "aba", which is the first match to "a.*a"?  If
> not, what would be the regexp that would give me the first match to
> "a.*a", without greedy expansion of the .*?
>
> Duncan Murdoch
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list