[Rd] 'ordered' destroyed to 'factor'

Robert McGehee rmcgehee at walleyetrading.net
Fri Jun 16 15:59:45 CEST 2017


Hi,
It's been my experience that when you combine or aggregate vectors of factors using a function, you should be prepared for surprises, as it's not obvious what the "right" way to combine factors is (ordered or not), especially if two vectors of factors have different levels or (if ordered) are ordered in a different way.

For instance, what would you expect to get from unlist() if each element of the list had different levels, or were both ordered, but in a different way, or if some elements of the list were factors and others were ordered factors?
> unlist(list(ordered(c("a","b")), ordered(c("b","a"))))
[1] ?

Honestly, my biggest surprise from your question was that unlist even returned a factor at all. For example, the c() function just converts factors to integers.
> c(ordered(c("a","b")), ordered(c("a","b")))
[1] 1 2 1 2

And here's one that's especially weird. When rbind() data frames with an ordered factor, you still get an ordered factor back, but the order may be different from either of the original orders:

> x1 <- data.frame(a=ordered(c("b","c")))
> x2 <- data.frame(a=ordered(c("a","b","c")))
> str(rbind(x1,x2)) #  Note b < a
 'data.frame':	5 obs. of  1 variable:
 $ a: Ord.factor w/ 3 levels "b"<"c"<"a": 1 2 3 1 2

Should rbind just have returned an integer like c(), or returned a factor like unlist(), or should it kept the result as an ordered factor, but ordered the result in a different way? I have no idea.

So in short, IMO, there are definitely inconsistencies in how ordered/factors are handled across functions, but I think it would be hard to point to any single function and say it is wrong or needs to be changed. My best advice, is to just be careful when combining or aggregating factors.
--Robert

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of "Jens Oehlschlägel"
Sent: Friday, June 16, 2017 9:04 AM
To: r-devel at r-project.org
Cc: jens.oehlschlaegel at truecluster.com
Subject: [Rd] 'ordered' destroyed to 'factor'

Dear all,
 
I don't know if you consider this a bug or feature, but it breaks reasonable code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even if all levels are equal. Here is a simple example:

o <- ordered(letters)
o[[1]]
lapply(o, min)[[1]]          # ordered factor
unlist(lapply(o, min))[[1]]  # no longer ordered
sapply(o, min)[[1]]          # no longer ordered

Jens Oehlschlägel
 
 
P.S: The above examples are silly for simple reproduction. The current behavior broke my use-case which had a structure like this
 
# have some data
x <- 1:20
# apply some function to each element
somefunc <- function(x){
  # do something and return an ordinal level
  sample(o, 1)
}
x <- sapply(x, somefunc)
# get minimum result
min(x)
# Error in Summary.factor(c(2L, 26L), na.rm = FALSE) :
#   ‘min’ not meaningful for factors
 
 
> version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          4.0                         
year           2017                        
month          04                          
day            21                          
svn rev        72570                       
language       R                           
version.string R version 3.4.0 (2017-04-21)
nickname       You Stupid Darkness        

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


More information about the R-devel mailing list