[R] Is there a good package for multiple imputation of missing values in R?

Robert A LaBudde ral at lcfltd.com
Mon Jun 30 19:55:26 CEST 2008


At 03:02 AM 6/30/2008, Robert A. LaBudde wrote:
>I'm looking for a package that has a start-of-the-art method of 
>imputation of missing values in a data frame with both continuous 
>and factor columns.
>
>I've found transcan() in 'Hmisc', which appears to be possibly 
>suited to my needs, but I haven't been able to figure out how to get 
>a new data frame with the imputed values replaced (I don't have 
>Herrell's book).
>
>Any pointers would be appreciated.

Thanks to "paulandpen", Frank and Shige for suggestions.

I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'.

I still haven't mastered the methodology for using aregImpute() in 
'Hmisc' based on the help information. I think I'll have to get hold 
of Frank's book to see how it's used in a complete example.

'Amelia' and 'norm' appear to be focused solely on continuous, 
multivariate normal variables, but my needs typically involve 
datasets with both factors and continuous variables.

The function mice() in 'mice' appears to best suit my needs, and the 
help file was intelligible, and it works on both factors and 
continuous variables.

For those in the audience with similar issues, here is a code snippet 
showing how some of these functions work ('felon' is a data frame 
with categorical and continuous predictors of the binary variable 'hired'):

library('mice') #missing data imputation library for md.pattern(), 
mice(), complete()
names(felon)  #show variable names
md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars

library('Hmisc')  #package for na.pattern() and impute()
na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars

#simple imputation can be done by
felon2<- felon  #make copy
felon2$felony<- impute(felon2$felony) #impute NAs (most frequent)
felon2$gender<- impute(felon2$gender) #impute NAs
felon2$natamer<- impute(felon2$natamer) #impute NAs
na.pattern(felon2[,1:4]) #show no NAs left in these vars
fit2<- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial)
summary(fit2)

#better, multiple imputation can be done via mice():
imp<- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations)
for (iSet in 1:5) {  #show results for the 5 imputation datasets
   fit<- glm(hired ~ felony + gender + natamer,
     data=complete(imp, iSet), family=binomial)  #fit to iSet-th realization
   print(summary(fit))
}

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



More information about the R-help mailing list