[R] Extract word from string based on pattern match

Ista Zahn istazahn at gmail.com
Tue Oct 25 01:35:05 CEST 2016


On Oct 24, 2016 6:05 PM, "Joe Ceradini" <joeceradini at gmail.com> wrote:
>
> Excellent - thanks David!
> Regex syntax never fails to scare the crap out of me :)
>
> David absolutely solved my problem (in record time, no less), so it
> can be put to rest. However, if anyone knows how to accomplish the
> same thing through non base packages, like stringr or stringi, I'd be
> interested in seeing those solutions as well.

Try it, its easy. I would be very surprised if you can't figure it out. The
stringr vignette is a good place to start.

Best,
Ista

>
> Thanks,
> Joe
>
>
> On Mon, Oct 24, 2016 at 3:42 PM, David Wolfskill <david at catwhisker.org>
wrote:
> >
> > On Mon, Oct 24, 2016 at 03:33:20PM -0600, Joe Ceradini wrote:
> > > R Helpers,
> > >
> > > I would like to extract the entire word beginning with "BT" (or "BT-")
> > > and not any thing else in the string. Or, I would like to extract from
> > > BT up until the next space.
> > >
> > > test <- data.frame(x = c("abc", "Sample BT-1501-2E stuff",
"Bt-1599-3E stuff"))
> > > test
> > >
> > > So, from test$x I would like to only extract "BT-1501-2E" and
"Bt-1599-3E".
> > >
> > > I started with straight grep but of course that is not what I need.
> > > grep("BT", test$x, value = TRUE, ignore.case = TRUE)
> > > "Sample BT-1501-2E stuff" "Bt-2134df stuff"
> > >
> > > In a somewhat similar post, the solution involved boundaries or
> > > anchors, but I haven't been able to adapt it to my needs, so I won't
> > > even bother including my boundary attempts :)
> > >
http://stackoverflow.com/questions/7227976/using-grep-in-r-to-find-strings-as-whole-words-but-not-strings-as-part-of-words
> > >
> > > If possible, it would also be helpful if something was returned, like
> > > NA, for rows without a "BT" match. So, conceptually, test$x would
> > > return:
> > > NA, "BT-1501-2E", "Bt-1599-3E".
> > >
> > > Thanks!
> > > Joe
> > > ....
> >
> > This is not exactly what you requested, as it returns the original
> > unmodified string when there's no match; I expect you can come up with
> > some code to test for that.  It does, however, meet the rest of your
> > requirements -- or so I believe:
> >
> > > test
> >                         x
> > 1                     abc
> > 2 Sample BT-1501-2E stuff
> > 3        Bt-1599-3E stuff
> > > sub("^.*(BT-?\\w*).*$", "\\1", test$x, ignore.case = TRUE, perl =
TRUE)
> > [1] "abc"     "BT-1501" "Bt-1599"
> > >
> >
> > Peace,
> > david
> > --
> > David H. Wolfskill                              david at catwhisker.org
> > Those who would murder in the name of God or prophet are blasphemous
cowards.
> >
> > See http://www.catwhisker.org/~david/publickey.gpg for my public key.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list