[Rd] handling of zero and negative indices in src/main/subscript.c:mat2indsub() (PR#7824)

Fri Apr 29 18:59:46 CEST 2005

This message contains a description of what looks like a bug, examples 
of the suspect behavior, a proposed change to the C code to change this 
behavior, example of behavior with the fix, and suggestions for 3 places 
to update the documentation to reflect the proposed behavior.  It is 
submitted for consideration for inclusion in R.  Comments are requested.

Currently, the code for subscripting by matrices checks that values in 
the matrix are not greater than the dimensions of the array being 
indexed.  However, it does not check for zero or negative indices, and 
blindly does index computation with them as though they were positive 
indices (including for negative indices whose absolute value exceeds the 
dimensions of the array being indexed).  Here are examples of indexing 
by matrices that do not return unequivocally sensible results (in most 
cases):

 > x <- matrix(1:6,ncol=2)
 > dim(x)
[1] 3 2
 > x
      [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
 > x[rbind(c(1,1), c(2,2))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(0,1))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(0,0))]
Error: only 0's may be mixed with negative subscripts
 > x[rbind(c(1,1), c(2,2), c(0,2))]
[1] 1 5 3
 > x[rbind(c(1,1), c(2,2), c(0,3))]
Error: subscript out of bounds
 > x[rbind(c(1,1), c(2,2), c(1,0))]
Error: only 0's may be mixed with negative subscripts
 > x[rbind(c(1,1), c(2,2), c(2,0))]
Error: only 0's may be mixed with negative subscripts
 > x[rbind(c(1,1), c(2,2), c(3,0))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(1,2))]
[1] 1 5 4
 > x[rbind(c(1,1), c(2,2), c(-1,2))]
[1] 1 5 2
 > x[rbind(c(1,1), c(2,2), c(-2,2))]
[1] 1 5 1
 > x[rbind(c(1,1), c(2,2), c(-3,2))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(-4,2))]
Error: only 0's may be mixed with negative subscripts
 > x[rbind(c(1,1), c(2,2), c(-1,-1))]
Error: subscript out of bounds
 >
 > # range checks are not applied to negative indices
 > x <- matrix(1:6, ncol=3)
 > dim(x)
[1] 2 3
 > x[rbind(c(1,1), c(2,2), c(-3,3))]
[1] 1 4 1
 > x[rbind(c(1,1), c(2,2), c(-4,3))]
[1] 1 4
 >
 > version
          _
platform i386-pc-mingw32
arch     i386
os       mingw32
system   i386, mingw32
status
major    2
minor    1.0
year     2005
month    04
day      18
language R
 >

The followed version of mat2indsub() (to replace the one in 
src/main/subscript.c) allows zero indices and explicitly disallows all 
negative indices:

/* Special Matrix Subscripting: Handles the case x[i] where */
/* x is an n-way array and i is a matrix with n columns. */
/* This code returns a vector containing the integer subscripts */
/* to be extracted when x is regarded as unravelled. */
/* Negative indices are not allowed. */
/* A zero anywhere in a row will cause a zero in the same */
/* position in the result. */

SEXP mat2indsub(SEXP dims, SEXP s)
{
     int tdim, j, i, k, nrs = nrows(s);
     SEXP rvec;

     PROTECT(rvec = allocVector(INTSXP, nrs));
     s = coerceVector(s, INTSXP);
     setIVector(INTEGER(rvec), nrs, 0);

     for (i = 0; i < nrs; i++) {
	tdim = 1;
	/* compute 0-based subscripts for a row (0 in the input */
         /* gets -1 in the output here) */
	for (j = 0; j < LENGTH(dims); j++) {
	    k = INTEGER(s)[i + j * nrs];
	    if(k == NA_INTEGER) {
		INTEGER(rvec)[i] = NA_INTEGER;
		break;
	    }
             if(k < 0)
		error(_("cannot have negative values in matrices used as subscripts"));
	    if(k == 0) {
		INTEGER(rvec)[i] = -1;
		break;
	    }
	    if (k > INTEGER(dims)[j])
		error(_("subscript out of bounds"));
	    INTEGER(rvec)[i] += (k - 1) * tdim;
	    tdim *= INTEGER(dims)[j];
	}
	/* transform to 1 based subscripting (0 in the input */
         /* gets 0 in the output here) */
	if(INTEGER(rvec)[i] != NA_INTEGER)
	    INTEGER(rvec)[i]++;
     }
     UNPROTECT(1);
     return (rvec);
}

With this version, the above commands (+ a couple more) produce the 
following output (with commands that fail suitably wrapped in try() so 
that they can be included in a test files.)

 > x <- matrix(1:6,ncol=2)
 > dim(x)
[1] 3 2
 > x
      [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
 > x[rbind(c(1,1), c(2,2))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(0,1))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(0,0))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(0,2))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(0,3))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(1,0))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(2,0))]
[1] 1 5
 > x[rbind(c(1,1), c(2,2), c(3,0))]
[1] 1 5
 > x[rbind(c(1,0), c(0,2), c(3,0))]
numeric(0)
 > x[rbind(c(1,0), c(0,0), c(3,0))]
numeric(0)
 > x[rbind(c(1,1), c(2,2), c(1,2))]
[1] 1 5 4
 > x[rbind(c(1,1), c(2,NA), c(1,2))]
[1]  1 NA  4
 > x[rbind(c(1,0), c(2,NA), c(1,2))]
[1] NA  4
 > try(x[rbind(c(1,1), c(2,2), c(-1,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-1, 2))]) :
         cannot have negative values in matrices used as subscripts
 > try(x[rbind(c(1,1), c(2,2), c(-2,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-2, 2))]) :
         cannot have negative values in matrices used as subscripts
 > try(x[rbind(c(1,1), c(2,2), c(-3,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-3, 2))]) :
         cannot have negative values in matrices used as subscripts
 > try(x[rbind(c(1,1), c(2,2), c(-4,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-4, 2))]) :
         cannot have negative values in matrices used as subscripts
 > try(x[rbind(c(1,1), c(2,2), c(-1,-1))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-1, -1))]) :
         cannot have negative values in matrices used as subscripts
 >
 > # verify that range checks are applied to negative indices
 > x <- matrix(1:6, ncol=3)
 > dim(x)
[1] 2 3
 > try(x[rbind(c(1,1), c(2,2), c(-3,3))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-3, 3))]) :
         cannot have negative values in matrices used as subscripts
 > try(x[rbind(c(1,1), c(2,2), c(-4,3))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-4, 3))]) :
         cannot have negative values in matrices used as subscripts
 >

The documentation page for ?Extract could have the following sentences 
added to the end of the 2nd para in the description of arguments "i, j, 
...":

   Negative indices are not allowed in matrices used as indices.
   NA and zero values are allowed: rows containing a zero are
   omitted from the result, and rows containing an NA produce an
   NA in the result.

In "Introduction to R", the same material could be added to the end of 
the section "Index Arrays" (in the chapter "Arrays and matrices").

In "R Language Definition", the third paragraph of the section "Indexing 
matrices and arrays" (Section 3.4.2 in my copy) currently reads:

> It is possible to use a matrix of integers as an index. In
> this case, the number of columns of the matrix should match
> the number of dimensions of the structure, and the result
> will be a vector with length as the number of rows of the
> matrix. The following example shows how to extract the
> elements m[1, 1] and m[2, 2] in one operation.

It could be changed to the following:

It is possible to use a matrix of integers as an index.
Negative indices are not allowed in matrices used as
indices.  NA and zero values are allowed: rows containing a
zero are omitted from the result, and rows containing an NA
produce an NA in the result.  To use a matrix as an index,
the number of columns of the matrix should match the number
of dimensions of the structure, and the result will be a
vector with length as the number of rows of the matrix
(except when the matrix contains zeros). The following
example shows how to extract the elements m[1, 1] and m[2,
2] in one operation.

Additionally, indexing by matrix just goes ahead and uses matrix 
elements as vector indices in cases where the number of columns in the 
matrix does not match the number of dimensions in the array.  E.g.:

 > x <- matrix(1:6,ncol=2)
 > x[rbind(c(1,1,1), c(2,2,2))]
[1] 1 2 1 2 1 2
 >

Would it not be preferable behavior to stop with an error in such cases?