[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Tue Oct 29 20:43:15 CET 2019


Hi all,

So I've started working on this and I ran into something that I didn't
know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
ignore dimension completely, treat x as an atomic vector, and return an
(unclassed) atomic vector:

> x = array(100, c(4, 5, 5))

> dim(x)

[1] 4 5 5

> head(x, 1)

[1] 100

> class(head(x))

[1] "numeric"


(For a 1d array, it does return another 1d array).

When extending head/tail to understand multiple dimensions as discussed in
this thread, then, should the behavior for 2+d arrays be explicitly
retained, or should head and tail do the analogous thing (with a head(<2d
array>) behaving the same as head(<matrix>), which honestly is what I
expected to already be happening)?

Are people using/relying on this behavior in their code, and if so, why/for
what?

Even more generally, one way forward is to have the default methods check
for dimensions, and use length if it is null:

tail.default <- tail.data.frame <- function(x, n = 6L, ...)
{
    if(any(n == 0))
        stop("n must be non-zero or unspecified for all dimensions")
    if(!is.null(dim(x)))
        dimsx <- dim(x)
    else
        dimsx <- length(x)

    ## this returns a list of vectors of indices in each
    ## dimension, regardless of length of the the n
    ## argument
    sel <- lapply(seq_along(dimsx), function(i) {
        dxi <- dimsx[i]
        ## select all indices (full dim) if not specified
        ni <- if(length(n) >= i) n[i] else dxi
        ## handle negative ns
        ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
        seq.int(to = dxi, length.out = ni)
    })
    args <- c(list(x), sel, drop = FALSE)
    do.call("[", args)
}


I think this precludes the need for a separate data.frame method at all,
actually, though (I would think) tail.data.frame would still be defined and
exported for backwards compatibility. (the matrix method has some extra
bits so my current conception of it is still separate, though it might not
NEED to be).

The question then becomes, should head/tail always return something with
the same dimensionally (number of dims) it got, or should data.frame and
matrix be special cased in this regard, as they are now?

What are people's thoughts?
~G

	[[alternative HTML version deleted]]



More information about the R-devel mailing list