[Rd] as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)

Martin Maechler maechler at stat.math.ethz.ch
Sat Jan 16 20:45:21 CET 2016


>>>>> William Dunlap via R-devel <r-devel at r-project.org>
>>>>>     on Wed, 13 Jan 2016 13:46:05 -0800 writes:

> as.data.frame methods behave inconsistently when they are given a row.name
> argument of the wrong length.  The matrix method silently ignores row.names
> if it has the wrong length and the numeric, integer, and character methods
> do not bother to check and thus make an illegal data.frame.
> 
> > as.data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))
>   V1 V2
> 1  1  4
> 2  2  5
> 3  3  6
> > as.data.frame(1:3, row.names=c("One","Two"))
>     1:3
> One   1
> Two   2
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
>   corrupt data frame: columns will be truncated or padded with NAs
> > as.data.frame(c("a","b","c"), row.names=c("One","Two"))
>     c("a", "b", "c")
> One                a
> Two                b
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
>   corrupt data frame: columns will be truncated or padded with NAs

as I said yesterday, I want to "fix" this in R.
As Paul Grosu mentioned, the bugous -- too tolerant -- behavior
is in the as.data.frame.vector() method,  and the
as.data.frame.matrix() simply drops wrong row.names and use
default row names in that case.

This would leave (at least) two ways to change:
1) the *.matrix compatible one simply forgets wrong  'row.names'
2) Wrong row.names are a user error.

Now, '1)' would be more in line with the matrix method, but
really feels wrong, because it does not catch user error and
silently disregards a specifically specified argument.

For '2)' I propose a fix which will only *warn* about the wrong
'row.names' for now (so code continues to work which has
implicitly relied on the wrong behavior, but with a warning:

    > as.data.frame(1:3, row.names=c("One","Two"))
      1:3
    1   1
    2   2
    3   3
    Warning message:
    In as.data.frame.integer(1:3, row.names = c("One", "Two")) :
      'row.names' is not a character vector of length 3 -- omitting it. Will be an error!
    > 

This will give new warnings in packages, and package authors can
fix these.... before the above will eventually become an error.


The remaining question is if the  as.data.frame.matrix() method
should not also produce the same warning about illegal
row.names.  Interestingly, the *model.matrix* method does
produce an error even now, when row.names are specified of wrong
length:

   > ff <- log(Volume) ~ log(Height) + log(Girth)
   > m <- model.frame(ff, trees)
   > mat <- model.matrix(ff, m)
   > data.frame(mat, row.names = paste0("r", 1:30))
   Error in data.frame(mat, row.names = paste0("r", 1:30)) : 
     row names supplied are of the wrong length
   >



More information about the R-devel mailing list