[R] table and unique seems to behave differently

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Tue Dec 10 17:39:09 CET 2019


On 10/12/2019 10:32 a.m., Sarah Goslee wrote:
> Back to the table part of the question, but using Duncan's example.
> 
>> x <- c(3.4, 3.4 + 1e-15)
>> unique(x)
> [1] 3.4 3.4
>> table(x)
> x
> 3.4
>    2
> 
> The question was, why are these different.
> 
> table() only works on factors, so it converts the numeric vector to a
> factor before tabulation.
> factor() tries to do something sensible, and implicitly rounds the numeric data.
> 
>> factor(x)
> [1] 3.4 3.4
> Levels: 3.4
> 
> Whether you think that is actually sensible or not is up to you, but
> if it isn't then you shouldn't use table.
> 
> That table uses factors is documented in ?table. A quick read of
> ?factor didn't find any explicit discussion, other than the
> acknowledgement that factor() is lossy in:
> 
>       To transform a factor ‘f’ to approximately its
>       original numeric values, ‘as.numeric(levels(f))[f]’ is recommended
>       and slightly more efficient than ‘as.numeric(as.character(f))’.
> 
> You can't even get table() to do what you want by being explicit:
> 
>> table(factor(x, levels = unique(x)))
> Error in `levels<-`(`*tmp*`, value = as.character(levels)) :
>    factor level [2] is duplicated

You could get it to agree with unique if you do the string conversion 
yourself.  The first result is ugly:

x <- c(3.4, 3.4 + 1e-15)
tab <- table(sprintf("%a", x))
tab
#>
#> 0x1.b333333333333p+1 0x1.b333333333335p+1
#>                    1                    1

But if you really want you can make it readable:

names(tab) <- as.numeric(names(tab))
tab
#> 3.4 3.4
#>   1   1

Duncan Murdoch



More information about the R-help mailing list