[R] using regular expressions to retrieve a digit-digit-dot structure from a string

Marc Schwartz marc_schwartz at me.com
Mon Jun 8 19:34:27 CEST 2009


On Jun 8, 2009, at 9:15 AM, Mark Heckmann wrote:

> Hi,
>
>
>
> i need to recognize itemization structures in strings which follow the
> format: "digit-digit-dot" like e.g.
>
>
>
> 1.
>
> 2.
>
> 19.
>
> 211.
>
>
>
> Given the string " This happened in the 21. century." (the dot  
> behind 21 is
> used in German instead of 21st) I want know where the dots are but I  
> do not
> want the 21.-dot to be returned as well.
>
>
>
> I am not good at regular expressions. How can I retrieve or  
> recognize dots
> excluding the digit-digit-dot structure?
>
>
>
> TIA, Mark
>

vec <- c("1.", "2.", "19.", "211.", "This happened in the 21. century")

 > grep("^[0-9]+\\.", vec, value = TRUE)
[1] "1."   "2."   "19."  "211."


The regex "^[0-9]+\\." is interpreted as "match one or more digits  
followed by a period, only at the beginning of the line".  The caret  
'^' defines the beginning of the line, so that a sequence of numbers  
followed by a period in the middle of the line will not match.

HTH,

Marc Schwartz




More information about the R-help mailing list