[Rd] Suggestion on default 'levels' in 'factor'

Suharto Anggono Suharto Anggono suharto_anggono at yahoo.com
Fri May 6 10:05:26 CEST 2016


At first read, the logic of the following fragment in code of function 'factor' was not clear to me.
    if (missing(levels)) {
	y <- unique(x, nmax = nmax)
	ind <- sort.list(y) # or possibly order(x) which is more (too ?) tolerant
	y <- as.character(y)
	levels <- unique(y[ind])
    }

Code similar to the originally proposed in https://stat.ethz.ch/pipermail/r-devel/2009-May/053316.html is more readable to me.

I suggest using this.
    if (missing(levels))
	levels <- unique(as.character(
            sort.int(unique(x, nmax = nmax), na.last = TRUE)# or possibly sort(x) which is more (too ?) tolerant
            ))

I assume that as.character(y)[sort.list(y)] is equivalent to as.character(sort.int(y, na.last = TRUE)). So, what I suggest above has the same effect as code in current 'factor'.  Function 'sort.int' instead of 'sort' to be like 'sort.list' that fails for non-atomic input.

What I suggest is similar in form to default 'levels' in 'factor' in R before version 2.10.0, which is
sort(unique.default(x), na.last = TRUE)

If this suggestion is used, the help page for 'factor' can be changed to say "(by 'sort.int')" instead of "(by 'sort.list')".



More information about the R-devel mailing list