[R] missing values in logistic regression

Fri Oct 29 12:06:51 CEST 2004

Avril Coghlan <avril.coghlan at ucd.ie> writes:

> Dear R help list,
> 
>    I am trying to do a logistic regression
> where I have a categorical response variable Y
> and two numerical predictors X1 and X2. There
> are quite a lot of missing values for predictor X2.
> eg.,
> 
> Y     X1   X2
> red   0.6  0.2    *
> red   0.5  0.2    *
> red   0.5  NA
> red   0.5  NA
> green 0.2  0.1    *
> green 0.1  NA
> green 0.1  NA
> green 0.05 0.05   *
> 
> 
> I am wondering can I combine X1 and X2 in
> a logistic regression to predict Y, using
> all the data for X1, even though there are NAs in
> the X2 data?
> 
> Or do I have to take only the cases for which
> there is data for both X1 and X2? (marked
> with *s above)
> 
> I will be very grateful for any help,

The "built-in" function (glm) for logistic regression will give you
a complete-case analysis. 

For more advanced handling of missing values, you need to look into
imputation methods. Two CRAN packages (at least) are dealing with
this, namely "mix" and "mitools". The former is support software for a
book, which you'll probably want to consult.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907