[R] reverse array indexing

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Jul 31 07:08:21 CEST 2003


Jerome Asselin <jerome at hivnet.ubc.ca> suggests this:
	arr <- array(rnorm(27),c(3,3,3))
	dimarr <- dim(arr)
	tmparr <- array(1:prod(dimarr),dimarr)
	sapply(c(3),function(x,tmparr) which(tmparr==x,T),tmparr=tmparr)
	sapply(c(3,17,13,5),function(x,tmparr) which(tmparr==x,T),tmparr=tmparr)
	
Of course, in R we can simplify the last two lines to
    sapply(<<argument goes here>>, function(x) which(tmparr==x,T))

However, wearing my "computer scientist" hat, I have to wonder about costs.
This is basically the equivalent of the APL "decode" operator.

Let's define

    index.decode <- function (index, array) {
	dimarr <- dim(arr)
	tmparr <- array(1:prod(dimarr), dimarr)
	sapply(index, function(x) which(tmparr == x, T))
    }

The result is a matrix with C=length(index) columns
and R=length(dim(array)) rows.  This has to take time O(R*C), because
the result occupies O(R*C) space and all of it has to be defined.

Now it is possible to implement index.decode so that it does take O(R*C)
time.   Here's an outline, which I shan't bother to finish.  (I'd do
ndims==4 and the general case if I were going to finish it.  I'd also
have a drop= argument to handle the case where length(index)==1.)

    index.decode <- function (index, array) {
	jndex <- index - 1
	dimarr <- dim(arr)
	ndims <- length(dimarr)
	if (ndims == 1) {
	    rbind(index)
	} else
	if (ndims == 2) {
	    rbind(jndex %% dimarr[1] + 1, jndex %/% dimarr[1] + 1)
	} else
	if (ndims == 3) {
	    rbind(jndex %% dimarr[1] + 1,
	          (jndex %/% dimarr[1]) %% dimarr[2] + 1,
	          jndex %/% (dimarr[1]*dimarr[2]) + 1)
        } else {
            stop("length(dims(array)) > 3 not yet implemented")
	}
    }

This is clearly O(R*C).  What about the
sapply(index, function(x) which(tmparr==x, T))
approach?

tmparr is of size prod(dimarr); call that P.  The expression tmparr==x
has to examine each element of tmparr, so that's O(P).  This is done
for each element of index (C times), so the total is O(P*C).

Consider

    mega <- array(1:1000000, c(100,100,100))
    inxs <- as.integer(runif(10000, min=1, max=1000000))

Here C = length(inxs) = 10000, R = length(dim(mega)) = 3,
P = prod(dim(mega)) = 1000000.  O(R*C) is looking *really* good
compared with O(P*C).

> system.time(index.decode(inxs, mega))
[1] 0.03 0.00 0.03 0.00 0.00
> system.time(slow.decode(inxs, mega))
[1] 3.51 0.79 4.33 0.00 0.00

Mind you, on a 500MHz UltraSPARC, I had to use big arrays to get any
measurable time at all...




More information about the R-help mailing list