[R] A Tip: lm, glm, and retained cases

Wed Aug 27 02:31:49 CEST 2008

On 26-Aug-08 23:49:37, hadley wickham wrote:
> On Tue, Aug 26, 2008 at 6:45 PM, Ted Harding
> <Ted.Harding at manchester.ac.uk> wrote:
>> Hi Folks,
>> This tip is probably lurking somewhere already, but I've just
>> discovered it the hard way, so it is probably worth passing
>> on for the benefit of those who might otherwise hack their
>> way along the same path.
>>
>> Say (for example) you want to do a logistic regression of a
>> binary response Y on variables X1, X2, X3, X4:
>>
>>  GLM <- glm(Y ~ X1 + X2 + X3 + X4)
>>
>> Say there are 1000 cases in the data. Because of missing values
>> (NAs) in the variables, the number of complete cases retained
>> for the regression is, say, 600. glm() does this automatically.
>>
>> QUESTION: Which cases are they?
>>
>> You can of course find out "by hand" on the lines of
>>
>>  ix <- which( (!is.na(Y))&(!is.na(X1))&...&(!is.na(X4)) )
>>
>> but one feels that GLM already knows -- so how to get it to talk?
>>
>> ANSWER: (e.g.)
>>
>>  ix <- as.integer(names(GLM$fit))
> 
> Alternatively, you can use:
> 
> attr(GLM$model, "na.action")
> 
> Hadley

Thanks! I can see that it works -- though understanding how
requires a deeper knowledge of "R internals". However, since
you've approached it from that direction, simply

  GLM$model

is a dataframe of the retained cases (with corresponding
row-names), all variables at once, and that is possibly an
even simpler approach!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Aug-08                                       Time: 01:31:46
------------------------------ XFMail ------------------------------