[R] Extract word from string based on pattern match

Joe Ceradini joeceradini at gmail.com
Tue Oct 25 00:03:42 CEST 2016


Excellent - thanks David!
Regex syntax never fails to scare the crap out of me :)

David absolutely solved my problem (in record time, no less), so it
can be put to rest. However, if anyone knows how to accomplish the
same thing through non base packages, like stringr or stringi, I'd be
interested in seeing those solutions as well.

Thanks,
Joe


On Mon, Oct 24, 2016 at 3:42 PM, David Wolfskill <david at catwhisker.org> wrote:
>
> On Mon, Oct 24, 2016 at 03:33:20PM -0600, Joe Ceradini wrote:
> > R Helpers,
> >
> > I would like to extract the entire word beginning with "BT" (or "BT-")
> > and not any thing else in the string. Or, I would like to extract from
> > BT up until the next space.
> >
> > test <- data.frame(x = c("abc", "Sample BT-1501-2E stuff", "Bt-1599-3E stuff"))
> > test
> >
> > So, from test$x I would like to only extract "BT-1501-2E" and "Bt-1599-3E".
> >
> > I started with straight grep but of course that is not what I need.
> > grep("BT", test$x, value = TRUE, ignore.case = TRUE)
> > "Sample BT-1501-2E stuff" "Bt-2134df stuff"
> >
> > In a somewhat similar post, the solution involved boundaries or
> > anchors, but I haven't been able to adapt it to my needs, so I won't
> > even bother including my boundary attempts :)
> > http://stackoverflow.com/questions/7227976/using-grep-in-r-to-find-strings-as-whole-words-but-not-strings-as-part-of-words
> >
> > If possible, it would also be helpful if something was returned, like
> > NA, for rows without a "BT" match. So, conceptually, test$x would
> > return:
> > NA, "BT-1501-2E", "Bt-1599-3E".
> >
> > Thanks!
> > Joe
> > ....
>
> This is not exactly what you requested, as it returns the original
> unmodified string when there's no match; I expect you can come up with
> some code to test for that.  It does, however, meet the rest of your
> requirements -- or so I believe:
>
> > test
>                         x
> 1                     abc
> 2 Sample BT-1501-2E stuff
> 3        Bt-1599-3E stuff
> > sub("^.*(BT-?\\w*).*$", "\\1", test$x, ignore.case = TRUE, perl = TRUE)
> [1] "abc"     "BT-1501" "Bt-1599"
> >
>
> Peace,
> david
> --
> David H. Wolfskill                              david at catwhisker.org
> Those who would murder in the name of God or prophet are blasphemous cowards.
>
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.



More information about the R-help mailing list