[R] Inconsistence in specifying action for missing data

Martin Maechler maechler at stat.math.ethz.ch
Sat Sep 3 23:50:47 CEST 2005

>>>>> "Duncan" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>>     on Sat, 03 Sep 2005 11:40:18 -0400 writes:

    Duncan> John Sorkin wrote:
    >> A question for R (and perhaps S and SPlus) historians.
    >> Does anyone know the reason for the inconsistency in the
    >> way that the action that should be taken when data are
    >> missing is specified? There are several variants,
    >> na.action, na.omit, "T", TRUE, etc. I know that a foolish
    >> consistency is the hobgoblin of a small mind, but
    >> consistency can make things easier.
    >> My question is not meant as a complaint. I very much
    >> admire the R development team. I simply am curious.

    Duncan> R and S have been developed by lots of people, over
    Duncan> a long time.  I think that's it.

yes, but there's a bit more to it.

First, the question was "wrong" (don't you just hate such an answer?):
A more interesting  question would have asked why there was 
  'na.rm = {TRUE, FALSE}' 
on one hand and
  'na.action =  {na.omit, na.replace, .....}'
on the other hand,
since only these two appear as function *arguments* 
{at least in `decent' S and R functions}.

There, the answer has at least two parts:
- First, for some functionalities,  na.rm = TRUE/FALSE is the
  only thing that makes sense, so why should you have to use
  something more complicated?

- IIRC, 'na.rm' has been much earlier (S version 2),
  than 'na.action' (S version 3; with  na.replace much later IIRC);
  na.action was really becoming relevant only when thinking
  about model fitting and non-trivial missing value treatment.

Martin Maechler, ETH Zurich

More information about the R-help mailing list