[R] detecting any element in a vector of strings, appearing anywhere in any of several character variables in a dataframe

John Fox jfox at mcmaster.ca
Thu Jul 9 15:05:03 CEST 2015


Dear Chris,

If I understand correctly what you want, how about the following?

> rows <- apply(zz[, 2:3], 1, function(x) any(sapply(alarm.words, grepl, x=x)))
> zz[rows, ]

          v1                              v2                v3 v4
3  -1.022329                    green turtle    ronald weasley  2
6   0.336599              waffle the hamster        red sparks  1
9  -1.631874 yellow giraffe with a long neck gandalf the white  1
10  1.130622                      black bear  gandalf the grey  2

I hope this helps,
 John

------------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
	

On Wed, 08 Jul 2015 22:23:37 -0400
 "Christopher W. Ryan" <cryan at binghamton.edu> wrote:
> Running R 3.1.1 on windows 7
> 
> I want to identify as a case any record in a dataframe that contains any
> of several keywords in any of several variables.
> 
> Example:
> 
> # create a dataframe with 4 variables and 10 records
> v2 <- c("white bird", "blue bird", "green turtle", "quick brown fox",
> "big black dog", "waffle the hamster", "benny likes food a lot", "hello
> world", "yellow giraffe with a long neck", "black bear")
> v3 <- c("harry potter", "hermione grainger", "ronald weasley", "ginny
> weasley", "dudley dursley", "red sparks", "blue sparks", "white dress
> robes", "gandalf the white", "gandalf the grey")
> zz <- data.frame(v1=rnorm(10), v2=v2, v3=v3, v4=rpois(10, lambda=2),
> stringsAsFactors=FALSE)
> str(zz)
> zz
> 
> # here are the keywords
> alarm.words <- c("red", "green", "turtle", "gandalf")
> 
> # For each row/record, I want to test whether the string in v2 or the
> string in v3 contains any of the strings in alarm.words. And then if so,
> set zz$v5=TRUE for that record.
> 
> # I'm thinking the str_detect function in the stringr package ought to
> be able to help, perhaps with some use of apply over the rows, but I
> obviously misunderstand something about how str_detect works
> 
> library(stringr)
> 
> str_detect(zz[,2:3], alarm.words)    # error: the target of the search
>                                      # must be a vector, not multiple
>                                      # columns
> 
> str_detect(zz[1:4,2:3], alarm.words) # same error
> 
> str_detect(zz[,2], alarm.words)      # error, length of alarm.words
>                                      # is less than the number of
>                                      # rows I am using for the
>                                      # comparison
> 
> str_detect(zz[1:4,2], alarm.words)   # works as hoped when
> length(alarm.words)                  # confining nrows
>                                      # to the length of alarm.words
> 
> str_detect(zz, alarm.words)          # obviously not right
> 
> # maybe I need apply() ?
> my.f <- function(x){str_detect(x, alarm.words)}
> 
> apply(zz[,2], 1, my.f)     # again, a mismatch in lengths
>                            # between alarm.words and that
>                            # in which I am searching for
>                            # matching strings
> 
> apply(zz, 2, my.f)         # now I'm getting somewhere
> apply(zz[1:4,], 2, my.f)   # but still only works with 4
>                            # rows of the dataframe
> 
> 
> # perhaps %in% could do the job?
> 
> Appreciate any advice.
> 
> --Chris Ryan
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list