[Rd] RFC: sapply() limitation from vector to matrix, but not further

Marc Schwartz marc_schwartz at me.com
Wed Dec 1 14:59:00 CET 2010


On Dec 1, 2010, at 2:39 AM, Martin Maechler wrote:

> sapply() stems from S / S+ times and hence has a long tradition.
> In spite of that I think that it should be enhanced...
> 
> As the subject mentions, sapply() produces a matrix in cases
> where the list components of the lapply(.) results are of the
> same length (and ...).
> However, it unfortunately "stops there".
> E.g., if you *nest* two sapply() calls where the inner one
> produces a matrix, very often the logical behavior would be for
> the outer sapply() to stack these matrices into an array of 
> rank 3 ["array rank"(x) := length(dim(x))].
> However it does not do that, e.g., an artifical example
> 
> p0 <- function(...) paste(..., sep="")
> myF <- function(x,y) {
>    stopifnot(length(x) <= 3)
>    x <- rep(x, length.out=3)
>    ny <- length(y)
>    r <- outer(x,y)
>    dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny)))
>    r
> }
> 
> and
> 
>> (v <- structure(10*(5:8), names=LETTERS[1:4]))
> A  B  C  D 
> 50 60 70 80 
> 
> if we let sapply() not simplify, we see the list of same size
> matrices it produes:
> 
>> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
> $A
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
> 
> $B
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
> 
> $C
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
> 
> $D
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
> 
> However, quite deceptively
> 
>> sapply(v, myF, y = 2*(1:5))
>        A   B   C   D
> [1,] 100 120 140 160
> [2,] 100 120 140 160
> [3,] 100 120 140 160
> [4,] 200 240 280 320
> [5,] 200 240 280 320
> [6,] 200 240 280 320
> [7,] 300 360 420 480
> [8,] 300 360 420 480
> [9,] 300 360 420 480
> [10,] 400 480 560 640
> [11,] 400 480 560 640
> [12,] 400 480 560 640
> [13,] 500 600 700 800
> [14,] 500 600 700 800
> [15,] 500 600 700 800
> 
> 
> My proposal -- implemented and "make check" tested --
> is to add an optional argument  'ARRAY'
> which allows
> 
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
> , , A
> 
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
> 
> , , B
> 
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
> 
> , , C
> 
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
> 
> , , D
> 
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
> 
>> 
> -----------
> 
> In the best of all worlds, the default would be 'ARRAY = TRUE',
> but of course, given the long-standing different behavior,
> it seem much too "risky", and my proposal includes remaining
> back-compatible with default ARRAY = FALSE.
> 
> Martin Maechler,
> ETH Zurich


Seems to me to be a reasonable proposal Martin, obviously with the proviso that the current default behavior is unaltered, as you note.

Regards,

Marc



More information about the R-devel mailing list