[R] Creating a new column from a series of columns

Fisher Dennis fisher at plessthan.com
Sat Nov 1 02:32:48 CET 2014


R 3.1.1
OS X

Colleagues,
I have a dataset containing multiple columns indicating race for subjects in a clinical trial.  A subset of the data (obtained with dput) is shown here:

structure(list(PLTID = c(7157, 8138, 8150, 9112, 9114, 9115, 
9124, 9133, 9141, 9144, 9148, 12110, 12111, 12116, 12134, 12136, 
12137, 12142, 12143, 12146, 12147, 13159), Indian..RACE1. = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA), Asian..RACE2. = c("", "Yes", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
""), Black..RACE3. = c("Yes", "", "", "Yes", "Yes", "Yes", "Yes", 
"Yes", "", "Yes", "", "", "", "", "", "", "", "Yes", "Yes", "", 
"", ""), Native.Hawaiian.or.other.Pacif..RACE4. = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), White..RACE5. = c("", "", "Yes", "", "", "", "", 
"", "Yes", "", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"", "", "Yes", "Yes", "Yes"), Other.Race..RACE6. = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA), Specify.Other.Race..RACEOTH. = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA)), .Names = c("PLTID", "Indian..RACE1.", "Asian..RACE2.", 
"Black..RACE3.", "Native.Hawaiian.or.other.Pacif..RACE4.", "White..RACE5.", 
"Other.Race..RACE6.", "Specify.Other.Race..RACEOTH."), class = "data.frame", row.names = 43:64)

I would like to add a column that indicates which of the other columns contains “Yes”.  In other words, that column would contain:
	Black..RACE3.
	Asian..RACE2.
	White..RACE5.
	Black..RACE3.
	…

Even better would be
	Black
	Asian
	White
	Black
	…
(which I can accomplish with strsplit)

None of the rows contains more than one ‘Yes’ although it is possible that none of the entries in a row would be ‘Yes’ (in which case, the entry in the new column should be NA)

I could do this by looping through each of the columns with something like this:
	DATA$RACE	 	<- NA
	for (COL in 2:8)	DATA$RACE[which(DATA[,COL] == "Yes")]	<- names(DATA)[COL] 
But, I suspect that there is some more elegant way to accomplish this.

Thanks in advance.

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com



More information about the R-help mailing list