[R] Residuals with NAs (was Can't understand error message)

Fri Mar 5 06:06:14 CET 1999

On 5 Mar 1999, Peter Dalgaard BSA wrote:

> John Logsdon <j.logsdon at lancaster.ac.uk> writes:
> 
> > On 2 Mar 1999, Peter Dalgaard BSA wrote:
> > 
> > > 
> > > (1) Missing values in response and/or regressors cause cases to be
> > >     discarded. 
> > > (2) Plotting which of the y's against which x's ?
> > > 
> > > plot(mschmod$residuals ~ size94[complete.cases(mavgres,crimesch,
> > > socstat,povnojob,ploinc94,aa94,hisp94,minty94,mixed94)])
> > > 
> > > should do the trick. Or, simpler but sneakier:
> > > 
> > > attach(sizef[rownames(mschmod$model),])
> > > plot(residuals(mschmod) ~ size94)
> > > detach()
> > > 
> > > It should also work with:
> > > 
> > > evalq(plot(residuals(mschmod) ~ size94), sizef[rownames(mschmod$model),])
> > > 
> > > (none of the above is tested, since I don't have your data of course)
> > 
> > The problems of plotting residuals vs fitted data/covariates where there
> > are NAs caught me out a little while ago.  Would it not be better if the
> > fitting functions lm, glm etc and plot were consistent?  Thus either (a)
> > plot() omitted cases in the X or the Y which were NA before checking for
> > length consistency or (b) residuals() etc included NA in the appropriate
> > places. 
> 
> (a) won't work if you think closer about it. (b) might. I wouldn't be
> surprised if there's a rationale for the way things are now, but I
> can't seem to reconstruct it. Well, there's space saving of course,
> but given the waste in other areas, that is hardly a crucial point.
> Possibly, consistent behaviour of drop(), etc. has something to do
> with it.

I hope that (b) does work, as that is the direction S-PLUS is taking,
prompted by passionate advocacy from Terry Therneau whose survival code
does this. But, you do have to be very careful: you are implicitly assuming
(as does Terry, explicitly) that na.action=na.omit. That is by no means the
only possibility (not even the default), and na.action could also increase
the number of cases (multiple imputation).  It isn't just residuals: the
issue over predict is subtler, and you may want to handle fitted, residuals
and predict separately. And when you start doing this you may break a lot
of code.

The best way to avoid trouble is to use the row names/vector names, which
tell you which of the original cases you have. Now they are passed down
correctly in R (I hope) you can just match the sets of names. (What, I
hear, you want the software to do that? Oh well, one day, for some plot
methods.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._