[R] Showing NAs when using table()
therneau at mayo.edu
Thu May 24 15:48:07 CEST 2007
Rephrasing David Kane's example
> b <- c(1,1,1,1,1, NA, 2,2,2,2)
> d <- factor(c(rep(c("A","B","C"), 3), NA))
> table(b, d, exclude=NULL)
b A B C
1 2 2 1
2 1 1 1
<NA> 0 0 1
Why are only 9 observations instead of 10 listed in the table?
This is a long-standing bug in Splus and R. Peter Dalgaar suggests
recoding the factor variable so that "NA" is a level, rather than a "missing".
This works, but it does not address the bug: for most of my factor variables
I want missing to be missing so that omission works as expected in modeling.
The exclude argument in table() should do what it says it does, which is to
list ALL data in the table when exclude=NULL.
At Mayo, we have replaced the table command to work around this (in place for
5+ years now). It has two additions: a method for factors that correctly
propogates the exclude argument, and a change to exclude=NULL as the
default. Table() is used, 99% of the time, to look at data on screen, and
the number of missing is often the first question I'm asking; so we found the
default to be, shall we say, non-intuitive.
We argued these points with Insightful many years ago and got nowhere, the
replys being a mix of a) it's not really broken and b) if we change it it might
break something. We had not carried the argument forward to the R community,
and just fix it ourselves. The revised version just works better day to day.
In R, the manual page has been revised to state that the exclude argument
is something different for factors, so I expect to remain in the minority.
(I can't think of a time I would ever have wanted the actions of the new
version of exclude, which for factors is a means only to exclude more things,
rather than the usual use of keeping more in the table).
More information about the R-help