[R] Is there a good package for multiple imputation of missing values in R?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Mon Jun 30 20:25:13 CEST 2008


Robert A LaBudde wrote:
> At 03:02 AM 6/30/2008, Robert A. LaBudde wrote:
>> I'm looking for a package that has a start-of-the-art method of 
>> imputation of missing values in a data frame with both continuous and 
>> factor columns.
>>
>> I've found transcan() in 'Hmisc', which appears to be possibly suited 
>> to my needs, but I haven't been able to figure out how to get a new 
>> data frame with the imputed values replaced (I don't have Herrell's 
>> book).
>>
>> Any pointers would be appreciated.
> 
> Thanks to "paulandpen", Frank and Shige for suggestions.
> 
> I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'.
> 
> I still haven't mastered the methodology for using aregImpute() in 
> 'Hmisc' based on the help information. I think I'll have to get hold of 
> Frank's book to see how it's used in a complete example.

It's not in the book; it will be in the 2nd edition someday
Frank

> 
> 'Amelia' and 'norm' appear to be focused solely on continuous, 
> multivariate normal variables, but my needs typically involve datasets 
> with both factors and continuous variables.
> 
> The function mice() in 'mice' appears to best suit my needs, and the 
> help file was intelligible, and it works on both factors and continuous 
> variables.
> 
> For those in the audience with similar issues, here is a code snippet 
> showing how some of these functions work ('felon' is a data frame with 
> categorical and continuous predictors of the binary variable 'hired'):
> 
> library('mice') #missing data imputation library for md.pattern(), 
> mice(), complete()
> names(felon)  #show variable names
> md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
> 
> library('Hmisc')  #package for na.pattern() and impute()
> na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
> 
> #simple imputation can be done by
> felon2<- felon  #make copy
> felon2$felony<- impute(felon2$felony) #impute NAs (most frequent)
> felon2$gender<- impute(felon2$gender) #impute NAs
> felon2$natamer<- impute(felon2$natamer) #impute NAs
> na.pattern(felon2[,1:4]) #show no NAs left in these vars
> fit2<- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial)
> summary(fit2)
> 
> #better, multiple imputation can be done via mice():
> imp<- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations)
> for (iSet in 1:5) {  #show results for the 5 imputation datasets
>   fit<- glm(hired ~ felony + gender + natamer,
>     data=complete(imp, iSet), family=binomial)  #fit to iSet-th realization
>   print(summary(fit))
> }
> 
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> 
> "Vere scire est per causas scire"
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list