[R] Subsetting a data frame

natalie.vanzuydam nvanzuydam at gmail.com
Mon Dec 5 12:32:10 CET 2011

Hi R users,

I really need help with subsetting  data frames:

I have a large database of medical records and I want to be able to match
patterns from a list of search terms .

I've used this simplified data frame in a previous example:

db <- structure(list(ind = c("ind1", "ind2", "ind3", "ind4"), test1 = c(1, 
2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, 
9, 1.2)), .Names = c("ind", "test1", "test2", "test3"), class =
"data.frame", row.names = c(NA, 

terms_include <- c("1","2","3") 
terms_exclude <- c("1.1","1.2","1.3") 

So in this example I want to include all the terms from terms include as
long as they don't occur with terms exclude in the same row of the data

Previously I was given this function which works very well if you want to
match exactly:

f <- function(x)  !any(x %in% terms_exclude) && any(x %in% terms_include) 
db[apply(db[, -1], 1, f), ] 

   ind test1 test2 test3 
2 ind2     2    27  28.0 
4 ind4     3     2   1.2 

I would like to know if there is a way to write a similar function that
looks for matches that start with the query string:  as in

I started writing a function but am not sure how to get it to return the
dataframe or matrix:

for (i in 1:length(terms_include)){
db_new <- apply(db,2, grepl,pattern=i)

Applying this function gives me:

db_new <- structure(c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, 
4L), .Dimnames = list(NULL, c("ind", "test1", "test2", "test3"

So the above is searching the pattern anywhere in the dataframe instead of
just at the beginning of the string.  

How would I incorporate look for terms to include but don't return the row
of the data frame if it also includes one of the terms to exclude while
using partial matching?

I hope that this makes sense.

Many thanks,

Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuydam at dundee.ac.uk
