[Rd] xtabs(), factors and NAs

Milan Bouchet-Valat nalimilan at club.fr
Sat Jan 21 14:42:56 CET 2017


Le vendredi 20 janvier 2017 à 18:59 +0100, Martin Maechler a écrit :
> > > > > > > > > > > > Milan Bouchet-Valat <nalimilan at club.fr>
> > > > > >     on Thu, 19 Jan 2017 13:58:31 +0100 writes:
> > Hi all,
> > I know this issue has been discussed a few times in the past already,
> > but Martin Maechler suggested in a bug report [1] that I raise it here.
> > 
> > Basically, there is no (easy) way of printing NAs for all variables
> > when calling xtabs() on factors. Passing 'exclude=NULL,
> > na.action=na.pass' works for character vectors, but not for factors.
> > 
> 
> [ yes, but your example below is *not* showing that ... so may be
>   a bit confusing !]  {Reason: stringsAsFactors etc}
Yes, sorry, that illustrates why one should never try to make an
example prettier in the last minute. For reference, here's the correct
example:

> test <- data.frame(x=c("a",NA), stringsAsFactors=FALSE)
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
   a <NA> 
   1    1 

> test <- data.frame(x=factor(c("a",NA)))
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
a 
1 


> > > test <- data.frame(x=c("a",NA))
> > > xtabs(~ x, exclude=NULL,
> > 
> > na.action=na.pass, data=test)
> > x
> > a 
> > 1 
> > 
> > > test <- data.frame(x=factor(c("a",NA)))
> > > xtabs(~ x, exclude=NULL,
> > 
> > na.action=na.pass, data=test)
> > x
> > a 
> > 1 
> > 
> > 
> > Even if it's documented, this inconsistency is annoying. When checking
> > data, it is often useful to print all NA values temporarily, without
> > calling addNA() individually on all crossed variables.
> 
>   {Note this is not (just) about print()ing; the issue is
>    about the resulting *object*.}
> > 
> > Would it make sense to add a new argument similar to table()'s useNA
> > which would behave the same for all input vector types?
> 
> You have to be aware that  table()  has been changed since R
> 3.3.2, i.e., is different in R-devel and hence will be different
> in R 3.4.0.
> table()'s handling of NAs has become very involved /
> sophisticated(*), and currently I'd rather like to keep
> xtabs()'s behavior much simpler. 
> 
> Interestingly, after starting to play with data containing NA's and
>   xtabs(*, na.action=na.pass)
> I have already detected bugs (for sparse=TRUE) and cases where
> the current xtabs() behavior seems dubious to me.
> So, the issue is --- as so often --- more involved than assumed initially.
> 
> We (R core) will probably do something, but do need more time
> before we can promise anything more...
OK, thanks. Given for how long this behavior has existed, there's
certainly no hurry...


Regards

> Thank you for raising the issue,
> Martin Maechler, ETH Zurich
> 
> 
> *) R-devel sources always current at
>    https://svn.r-project.org/R/trunk/src/library/base/R/table.R
> 
> > 
> > Regards
> > [1] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630



More information about the R-devel mailing list