[Rd] simplify2array assumes non-NA dim() -- base or method bug?

Michael Chirico ch|r|com @end|ng |rom goog|e@com
Fri Aug 26 05:19:20 CEST 2022


Does this somewhat contradict the requirement that dimensions are non-missing?

ncol(table(letters))
# [1] NA

On the other hand,

dim(table(letters))
# [1] 26

Maybe that's a bug for the ncol() method of table?

On Thu, Aug 25, 2022 at 2:55 AM Sebastian Meyer <seb.meyer using fau.de> wrote:
>
> Hmm. I think I haven't seen NA dims in base R before. R-lang says
>
> > The 'dim' attribute is used to implement arrays.  The content of
> > the array is stored in a vector in column-major order and the 'dim'
> > attribute is a vector of integers specifying the respective extents of
> > the array.  R ensures that the length of the vector is the product of
> > the lengths of the dimensions.
>
> And R-intro says:
>
> > A dimension vector is a vector of non-negative integers.
>
> I understand "vector of [non-negative] integers" as excluding NA_integer_.
>
> Correspondingly:
>
> > a <- 1
> > dim(a) <- NA_integer_
>
> Error in dim(a) <- NA_integer_ : the dims contain missing values
>
> > matrix(1, NA_integer_)
>
> Error in matrix(1, NA_integer_) : invalid 'nrow' value (too large or NA)
>
> > array(1, NA_integer_)
>
> Error in array(1, NA_integer_) : negative length vectors are not allowed
>
> There must be many more examples where non-NULL dim() is assumed to
> not contain missing values.
>
> simplify2array() aligns with that specification and only needs to check
> for numeric (non-NULL) dims at this point and not also !anyNA(c.dim).
> So my take is that there is no bug.
>
> Best regards,
>
>         Sebastian Meyer
>
>
> Am 25.08.22 um 04:51 schrieb Michael Chirico via R-devel:
> > If any component of the dim() returned as c.dim here[1] are missing,
> > simplify2array() errors inscrutably (specifically, because the last &&
> > condition is missing):
> >
> > Error in if (higher && length(c.dim <- unique(lapply(x, dim))) == 1 &&  :
> >    missing value where TRUE/FALSE needed
> >
> > At root here is that dim.tbl_lazy ({dbplyr} package method [2]) very
> > intentionally neglects to count the # of rows in the result -- the
> > whole point of a lazy table is to avoid calculating full queries
> > unless specifically requested, so the # of rows is left as missing,
> > i.e., there _is_ a quantity of rows, but the exact number is not
> > known.
> >
> > That seems to me like a proper usage of NA, and hence this is a
> > simplify2array() bug, but I am curious other thoughts here before
> > attempting a patch.
> >
> > [1] https://github.com/r-devel/r-svn/blob/2d4f8c283d53ff2c98d92c7b77b11e652297742c/src/library/base/R/sapply.R#L46-L48
> > [2] https://github.com/tidyverse/dbplyr/blob/36b146e36d6d9af215dc48e60862d4b807b9e606/R/tbl-lazy.R#L45-L47
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list