[Rd] Bug: as.matrix.data.frame() treats numeric vectors with "levels" attribute as factors

David Skalinder @k@||nder @end|ng |rom w|@c@edu
Fri Mar 1 23:04:27 CET 2019


Hello,

I think I've found a bug in as.matrix.data.frame().  The function's 
documentation says: "The method for data frames will return a character 
matrix if there is only atomic columns and any 
non-(numeric/logical/complex) column, applying as.vector to factors and 
format to other non-character columns. Otherwise, the usual coercion 
hierarchy (logical < integer < double < complex) will be used..."

However, when the function checks for non-numeric columns, it includes 
the following check for each column xj:

length(levels(xj)) > 0L

This means that any atomic, numeric, non-factor column with a "levels" 
attribute will cause as.matrix.data.frame() to return a character 
matrix, not use the usual coercion hierarchy as documented.  This means, 
for example, that columns that are unclassed factors will unexpectedly 
force as.matrix.data.frame() to return a character matrix.

To reproduce:

-----

df <- data.frame(v1 = 1:2, v2 = 3:4)

typeof(as.matrix(df)) # integer, as documented

attr(df[[1]], "levels") <- "test"
class(df[[1]]) # integer
typeof(as.matrix(df)) # character, despite all atomic, numeric, 
non-factor cols

df2 <- data.frame(v1 = unclass(factor(c("a", "b"))), v2 = 1:2)
typeof(as.matrix(df2)) # character, despite unclassing factor

attr(df2[[1]], "levels") <- NULL
typeof(as.matrix(df2)) # integer, even though no types changed
-----

I can reproduce this in 3.5.1 and 3.5.2, and I can't see anything 
related in the upcoming changes or in Bugzilla, so I thought I'd report 
it here.  I don't know what the cleanest fix will be, but it seems that 
either the function or the documentation should be changed so that they 
align.

Please let me know if you need any additional info!

Thanks

David



More information about the R-devel mailing list