[R] Working with regular expression

David Alston david.alston at gmail.com
Fri Jan 18 23:50:38 CET 2013


Greetings!

     I hope you don't mind, Rui Barradas, but I'd like to explain the
regex.  Parsing it was a fun exercise!

     Here's the regex broken into two parts..

[[:alpha:]_]*   = match zero or more alphabet or underscore characters
(.*)                = match zero or more characters and add them to \1
pattern buffer


Going character by character through the date string "asdf May 09 2009"

"asdf" matches the first part
"May 09 2009" matches the second part and is stored in the \1 pattern buffer


The gsub command -  gsub("[[:alpha:]_]*(.*)", "\\1", Text)   -
replaces the entire string (because this regex matches the entire
string.. they all begin with a sequence of alphabet and/or underscore
characters and the ".*" pattern at the end matches the rest of the
line) with the contents of the \1 pattern buffer and stores it in the
variable "Text".


     If the length of the string prepended to the date is consistent
another possible solution would be -   gsub(".{5}(.*)", "\\1", Text)
- which would strip off the first five characters  (".{5}" matches
five "any" characters).


--David Alston
"Without rules there  is no game for it is by the rules the game is defined."
          --SOv

On Fri, Jan 18, 2013 at 3:05 PM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:
> Thanks Rui for your help.
>
> Could you also explain the underlying logic please?
>
> Thanks and regards,
>
>
> On Sat, Jan 19, 2013 at 2:43 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>> gsub("[[:alpha:]_]*(.*)", "\\1", Text)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list