[Rd] Table vs unique

Terry Therneau therneau at mayo.edu
Wed Jul 21 14:20:46 CEST 2010


A bug in the survival routines was reported to me today.  The root cause
is a difference between table, unique, and sort.

> temp <- rep(c(1, sqrt(2)^2, 2), 1:3)
> unique(temp)
[1] 1 2 2
> table(temp)
temp
1 2 
1 5 

  I'm using 2.10 on Linux, the user reported from 2.9 on Windows.  

1. Minor issue: I think the root rounding occurs in factor.  I didn't
see any discussion of this in the help page, perhaps something should be
added.

2. The error popped up in summary.survfit but the root cause is an
inconsistent survfit object.  The survfit routine uses sort and unique
to create the unique survival times and most of the output, but table to
count them for another component.  
  Lumping the two versions of "2.0000...." together is the preferable
output.  I think the best solution will be to preprocess the time
variable so that the three operators are consistent.

	as.numeric(as.character(as.factor(time))) ?

Rather ugly.  But most importantly what is a guarranteed construct that
would ensure consistency?  Should we use a rounding level that is more
or less equivalent to all.equal()?

 The solution will have to be incorporated into survfit, coxph, ...
perhaps a dozen places in the survival suite so I'd like to get it right
the first time.

Terry T



More information about the R-devel mailing list