[R] as.factor and floating point numbers

Tobias Fellinger tobby @end|ng |rom htu@@t
Wed Jan 25 21:57:33 CET 2023


Hello, 

I'll reply in one mail to all. 

Thank you for your suggestions. I already tried Andrews solution with 
increasing the digits. In the most extreme case I encountered I had to take 
the maximum possible digits in format but it worked. 

Tims solution is also a good workaround but in this case I would have to know 
much about the user input.

Valentins solution works and is surely the safest of the options but somehow 
more than I need. The case I encountered does not really need to deal with the 
levels, but just with the counts of every unique value across another 
variable.

After thinking about it a little bit longer I came up with another solution 
that works alright for my purposes: I use table on the ranks. Since in the 
case I encountered the vector does not have duplicates and is already sorted, 
I can use table on the ranks of the vector and get the counts in the right 
order.

Thanks Everyone, Tobias

On Mittwoch, 25. Jänner 2023 20:59:16 CET Valentin Petzel wrote:
> Hello Tobias,
> 
> A factor is basically a way to get a character to behave like an integer. It
> consists of an integer with values from 1 to nlev, and a character vector
> levels, specifying for each value a level name.
> 
> But this means that factors only really make sense with characters, and
> anything that is not a character will be forced to be a character. Thus two
> values that are represented by the same value in as.character will be
> treated as the same.
> 
> Now this is probably reasonable most of the time, as numeric values will
> usually represent metric data, which tends to make little sense as factor.
> But if we want to do this we can easily build or own factors from floats,
> and even write some convenience wrapper around table, as shown in the
> appended file.
> 
> Best regards,
> Valentin
> 
> Am Mittwoch, 25. Jänner 2023, 10:03:01 CET schrieb Tobias Fellinger:
> > Hello,
> > 
> > I'm encountering the following error:
> > 
> > In a package for survival analysis I use a data.frame is created, one
> > column is created by applying unique on the event times while others are
> > created by running table on the event times and the treatment arm.
> > 
> > When there are event times very close together they are put in the same
> > factor level when coerced to factor while unique outputs both values,
> > leading to different lengths of the columns.
> > 
> > Try this to reproduce:
> > x <- c(1, 1+.Machine$double.eps)
> > unique(x)
> > table(x)
> > 
> > Is there a general best practice to deal with such issues?
> > 
> > Should calling table on floats be avoided in general?
> > 
> > What can one use instead?
> > 
> > One could easily iterate over the unique values and compare all values
> > with
> > the whole vector but this are N*N comparisons, compared to N*log(N) when
> > sorting first and taking into account that the vector is sorted.
> > 
> > I think for my purposes I'll round to a hundredth of a day before calling
> > the function, but any advice on avoiding this issue an writing more fault
> > tolerant code is greatly appreciated.
> > 
> > all the best, Tobias
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.



More information about the R-help mailing list