[R] search across a row for strings

David Winsemius dwinsemius at comcast.net
Mon Jun 15 22:34:25 CEST 2015


On Jun 15, 2015, at 1:12 PM, Federman, Douglas wrote:

> I'm trying to do the following: search each patient's list of diagnoses for a specific code then create a new column based upon the the presence of the specific code.  
> Simplified data follows:
> 
> con <- textConnection("
> ID	DX1	DX2	DX3
> 1	4109	4280	7102
> 2	734	311	490
> 3	4011	42822	4101
> ")
> df <- read.table(con, header = TRUE, strip.white = TRUE, colClasses="character")
> #
> # I would like to add a column such the result of searching for 410 would give:  The search string would always be at the start of a word and doesn't need regex.
> #
> # ID	DX1	DX2	DX3	htn
> # 1	4109	4280	7102	1
> # 2	734	311	490	0
> # 3	4011	42822	4101	1
> #
> # The following  works but is slow and returns NA if the search string is not found:
> 
> for (i in 1:nrow(df)) {
>    df[i,"htn"] <- any(sapply('410', function(x)  which( grepl(x, df[i, 2:4], fixed = TRUE) )))
> }

Is this any better?

> df$htn <-  apply(df[-1], 1, function(r) max( substr(r, 1,3) == "410" ))
> df
  ID  DX1   DX2  DX3 htn
1  1 4109  4280 7102   1
2  2  734   311  490   0
3  3 4011 42822 4101   1


Can add an na.rm=TRUE to the max call if warranted. `max` coerces logicals to integer.



-- 
David Winsemius
Alameda, CA, USA



More information about the R-help mailing list