[R] Logistic regression with multiple imputation

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Wed Jun 30 14:44:29 CEST 2010


There are titanic datasets in R binary format at 
http://biostat.mc.vanderbilt.edu/DataSets

Note that the aregImpute function in the Hmisc package streamlines many 
of the steps, in conjunction with the fit.mult.impute function.

Frank


On 06/30/2010 05:02 AM, Chuck Cleland wrote:
> On 6/30/2010 1:14 AM, Daniel Chen wrote:
>> Hi,
>>
>> I am a long time SPSS user but new to R, so please bear with me if my
>> questions seem to be too basic for you guys.
>>
>> I am trying to figure out how to analyze survey data using logistic
>> regression with multiple imputation.
>>
>> I have a survey data of about 200,000 cases and I am trying to predict the
>> odds ratio of a dependent variable using 6 categorical independent variables
>> (dummy-coded). Approximatively 10% of the cases (~20,000) have missing data
>> in one or more of the independent variables. The percentage of missing
>> ranges from 0.01% to 10% for the independent variables.
>>
>> My current thinking is to conduct a logistic regression with multiple
>> imputation, but I don't know how to do it in R. I searched the web but
>> couldn't find instructions or examples on how to do this. Since SPSS is
>> hopeless with missing data, I have to learn to do this in R. I am new to R,
>> so I would really appreciate if someone can show me some examples or tell me
>> where to find resources.
>
>    Here is an example using the Amelia package to generate imputations
> and the mitools and mix packages to make the pooled inferences.
>
> titanic<-
> read.table("http://lib.stat.cmu.edu/S/Harrell/data/ascii/titanic.txt",
> sep=',', header=TRUE)
>
> set.seed(4321)
>
> titanic$sex[sample(nrow(titanic), 10)]<- NA
> titanic$pclass[sample(nrow(titanic), 10)]<- NA
> titanic$survived[sample(nrow(titanic), 10)]<- NA
>
> library(Amelia) # generate multiple imputations
> library(mitools) # for MIextract()
> library(mix) # for mi.inference()
>
> titanic.amelia<- amelia(subset(titanic,
> select=c('survived','pclass','sex','age')),
>                           m=10, noms=c('survived','pclass','sex'),
> emburn=c(500,500))
>
> allimplogreg<- lapply(titanic.amelia$imputations,
> function(x){glm(survived ~ pclass + sex + age, family=binomial, data = x)})
>
> mice.betas.glm<- MIextract(allimplogreg, fun=function(x){coef(x)})
> mice.se.glm<- MIextract(allimplogreg, fun=function(x){sqrt(diag(vcov(x)))})
>
> as.data.frame(mi.inference(mice.betas.glm, mice.se.glm))
>
> # Or using only mitools for pooled inference
>
> betas<- MIextract(allimplogreg, fun=coef)
> vars<- MIextract(allimplogreg, fun=vcov)
> summary(MIcombine(betas,vars))
>
>> Thank you!
>>
>> Daniel
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list