[R] Inconsistence in specifying action for missing data

Thomas Lumley tlumley at u.washington.edu
Sun Sep 4 18:42:37 CEST 2005


On Sat, 3 Sep 2005, John Sorkin wrote:

> A question for R (and perhaps S and SPlus) historians.
>
> Does anyone know the reason for the inconsistency in the way that the
> action that should be taken when data are missing is specified? There
> are several variants, na.action, na.omit, "T", TRUE,  etc. I know that a
> foolish consistency is the hobgoblin of a small mind, but consistency
> can make things easier.
>

There's actually a little more consistency than first appears.  There are 
two most common ways to refer to missingness,  na.rm and na.action.  Usually 
na.rm has default TRUE (using T is a bug) and removes NAs from one vector 
at a time.

na.action usually has default na.omit() and works on whole data frames, eg 
na.omit and na.exclude do casewise deletion if any variable is NA.

These aren't completely uniform, and that is simply historical. I think 
there was once an attempt to make na.fail() the default na.action, but 
there was too much resistance to change.

 	-thomas




More information about the R-help mailing list