[R] multiple imputation based on a condition

Gary Collins collins.gs at gmail.com
Sat May 22 23:55:07 CEST 2010

Any suggestions on the following would be grateful.

I'm trying to impute data, where a fictitional dataset is defined as...

n <- 500
test <- data.frame(smoke_status = rbinom(n, 2, 0.6), smoke_amount = 
rbinom(n, 2, 0.5), rf1 = rnorm(n), rf2 = rnorm(n), outcome = rbinom(n, 
1, 0.3))

# smoke_status (0, 1, 2) is c("non-smoker, "ex-smoker", 
"current_smoker"), and
# smoke_amount (0, 1, 2) is c("light", "moderate", "heavy")
# rf1 and rf2 are two other risk factors (for illustration purposes - 
real data set has more risk factors)

# artificially NA some of these values
test$smoke_status[sample(1:nrow(test), 60)] <- NA
test$smoke_amount[sample(1:nrow(test), 60)] <- NA
test$rf1[sample(1:nrow(test), 50)] <- NA
test$rf2[sample(1:nrow(test), 50)] <- NA

I'm trying to impute all missing values, but I only want to impute 
smoke_amount if smoke_status==2 (i.e. they are a current smoker - makes 
no sense to impute smoke_amount if they do not smoke).  I can do this in 
STATA via the conditional option in ICE, but would prefer to keep this 
in R.  Any suggestions (if this is feasible via MICE, mi or Amelia)? I 
thought the passive imputation approach in MICE would be the way forward 
but I've so far been unsuccessful.

Thanks in advance.


More information about the R-help mailing list