[Rd] Problem with table

Terry Therneau therneau at mayo.edu
Tue Mar 27 15:12:08 CEST 2012


On 03/27/2012 02:05 AM, Prof Brian Ripley wrote:
> n 19/03/2012 17:01, Terry Therneau wrote:
>> R version 2.14.0, started with --vanilla
>>
>> > table(c(1,2,3,4,NA), exclude=2, useNA='ifany')
>> 1 3 4 <NA>
>> 1 1 1 2
>>
>> This came from a local user who wanted to remove one particular response
>> from some tables, but also wants to have NA always reported for data
>> checking purposes.
>> I don't think the above is what anyone would want.
>
> You have not told us what you want!
Want: that the resulting table exclude values of "2" from the printout, 
while still reporting NA.  This is what the local user expected, the one 
who came to me with their query.

There are lots of ways to get the program to do the right thing, the 
simplest is
      table(c(1,2,3,4,NA), exclude=2)     # keeping the default for useNA

You show another below.

>
> Try
>
> >  table(as.factor(c(1,2,3,4,NA)), exclude=2, useNA='ifany')
>
>    1    3    4 <NA>
>    1    1    1    1
>
> Note carefully how 'exclude' is defined:
>
>  exclude: levels to remove from all factors in ‘...’. If set to ‘NULL’,
>           it implies ‘useNA="always"’.
>
> As you did not specify a factor, 'exclude' was used in forming the 
> 'levels'.
>
That is almost a "legal loophole" reading of the manual.  I would never 
have seen through to that level of subtlety.  A primary reason is that a 
simple test shows that exclude works on non-factors.

I'm not sure what the best course of action is.  What I've reported is a 
case where use of the options in a fairly obvious way gives an 
unexpected answer.  On the other hand, I have never  before seen or 
considered the case where someone wanted to exclude an actual data level 
from table: I myself would always have removed a column from the 
result.   If fixing this causes other problems, then perhaps we just 
give up on this rare case.

As to our local choices, we figured out a way to make display of NA the 
default without causing the above problem.   As is often the case, a 
fairly simple solution became obvious to us about 30 minutes after 
submitting a question to the list.

Terry T.



More information about the R-devel mailing list