[R] Changing cell value for MANY unique pairings of values in 2 columns

Alicia Ellis alicia.m.ellis at gmail.com
Fri Mar 17 19:33:03 CET 2017


 am cleaning some very messy health record lab data.  Several of the rows
in the VALUE column have text entries and they need to be converted to
numeric in the NUMERIC_VALUE column based on the values in VALUE and
DESCRIPTION.  For example:

df <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than 0.30",
"12%", "<0.2", "Unknown"),
                 DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
                 NUMERIC_VALUE=c(9, 9,9,9,9,9,9))
df

df$NUMERIC_VALUE[df$VALUE == "Positive" & df$DESCRIPTION == "A"]=999999999


However, I need to do this for ~500 unique pairings of VALUE and
DESCRIPTION entries.  I'm trying to find an easy way to do this without
having to have 500 lines of code for each unique pairing.  Some of the
pairings will be changed to the same value (e.g., 99999999, or -999999999)
but many will be unique numeric values.


I've started by creating a new object called rules where a SUBSET of df
rows are included with the new value they should be changed to.


rules <- data.frame(VALUE = c("<60",  "Negative", "Less than 0.30", "<0.2",
"Unknown"),
                    DESCRIPTION = c("A", "B", "C", "E", "E"),
                    NEW_VALUE=c(60, -999999,0.29,0.1,777777))
rules


I tried doing a loop to change the values in df based on the suggested
value in rules:

for (i in (1 : nrow(rules))) {
  df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
[i,2]]=rules[i,3]
}
df

This gives the error:

Error in Ops.factor(df$VALUE, rules[i, 1]) :
  level sets of factors are differentwork and I think because when I write

If I create rules using the exact same levels as df it works:

rules <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
0.30", "12%", "<0.2", "Unknown"),
                    DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
                    NEW_VALUE=c(60, 999999,-999999,0.29,12,0.1,777777))
rules

for (i in (1 : nrow(rules))) {
  df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
[i,2]]=rules[i,3]
}
df


Can anyone suggest a way to modify my for loop so that it works for a
subset of rows in df and accomplish what I want?  Or suggest a completely
different method that works?


Thanks!

	[[alternative HTML version deleted]]



More information about the R-help mailing list