[Rd] sum() (and similar methods) should work for zero row data.frames

Martin rdev @end|ng |rom mb706@com
Sat Oct 17 21:18:57 CEST 2020


The "Summary" group generics always throw errors for a data.frame with zero rows, for example:
> sum(data.frame(x = numeric(0)))
#> Error in FUN(X[[i]], ...) : 
#>   only defined on a data frame with all numeric variables
Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0.

The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type.

I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent.

1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes):
> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE])
Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type.

2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that
> any(data.frame(x = 1))
#> [1] TRUE
#> Warning message:
#> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical
and
> any(data.frame(x = TRUE))
#> Error in FUN(X[[i]], ...) : 
#>   only defined on a data frame with all numeric variables
So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all.

(I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.)

Best,
Martin



More information about the R-devel mailing list