[R] strapply and characters adjacent to the matched pattern

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jul 25 13:37:31 CEST 2012


On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan at gmail.com> wrote:
> Hi,
>
> In the example below, one of the searched patterns "SE" is matched in the
> word "second". I would like to ignore all matches in which the character
> following the match is one of [:alpha:]. How do I do this without removing
> the "ignore.case = T" argument of the strapply function? Thank you very
> much!
>
> # load library
> require(gsubfn)
> # read in data
> data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
> # define the object to be searched
> text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma
> Holdings")
> # match
> strapply(text, data, ignore.case = T)
>
> The preferred outcome would be:
>
> [[1]]
> [1] "Santa Fe Gold Corp"
>
> [[2]]
> [1] "Starpharma Holdings"
>
> instead of:
>
> [[1]]
> [1] "Santa Fe Gold Corp"
>
> [[2]]
> [1] "se"                  "Starpharma Holdings"
>
>

Try this:

> strapply(c("abc", "ab", "ab def"), "(ab|d)($|[^[[:alpha:]])")
[[1]]
NULL

[[2]]
[1] "ab"

[[3]]
[1] "ab"


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list