[Rd] Correction to vec-subset speed patch

Radford Neal radford at cs.toronto.edu
Wed Sep 8 04:20:04 CEST 2010


I found a bug in one of the fourteen speed patches I posted, namely in
patch-vec-subset.  I've fixed this (I now see one does need to
duplicate index vectors sometimes, though one can avoid it most of the
time). I also split this patch in two, since it really has two
different and independent parts.  The patch-vec-subset patch now has
only some straightforward (locally-checkable) speedups for copies.
The new patch-subscript patch has the speedups for creation of index
vectors, which is where the bug was, and which generally have more
global interactions.  I made some other changes in the patch-subscript
part along with fixing the bug.

I've attached the new versions of the patches.  Here is the
documentation for the two revised patches:

patch-vec-subset

    Speeds up extraction of subsets of vectors or matrices (eg, v[10:20]
    or M[1:10,101:110]).  This is done with detailed code improvements.

    Relevant test script:  test-vec-subset.r

    There are lots of tests in this script.  The most dramatic improvement
    is for extracting many rows and columns of a large array, where the 
    improvement is by about a factor of four.  Extracting many rows from
    one column of a matrix is sped up by about 30%. 

    Changes unrelated to speed improvement:

    Fixes two latent bugs where the code incorrectly refers to NA_LOGICAL
    when NA_INTEGER is appropriate and where LOGICAL and INTEGER types
    are treated as interchangeable.  These cause no problems at the moment,
    but would if representations were changed.

patch-subscript

    (Formerly part of patch-vec-subset)  This patch also speeds up
    extraction, and also replacement, of subsets of vectors or
    matrices, but focuses on the creation of the indexes rather than
    the copy operations.  Often avoids a duplication (see below) and
    eliminates a second scan of the subscript vector for zero
    subscripts, folding it into a previous scan at no additional cost.

    Relevant test script:  test-vec-subset.r

    Speeds up some operations with scalar or short vector indexes by
    about 10%.  Speeds up subscripting with a longer vector of
    positive indexes by about 20%.

    Issues:  The current code duplicates a vector of indexes when it
    seems unnecessary.  Duplication is for two reasons:  to handle
    the situation where the index vector is itself being modified in
    a replace operation, and so that any attributes can be removed, which 
    is helpful only for string subscripts, given how the routine to handle 
    them returns information via an attribute.  Duplication for the
    second reasons can easily be avoided, so I avoided it.  The first
    reason for duplication is sometimes valid, but can usually be avoided
    by first only doing it if the subscript is to be used for replacement
    rather than extraction, and second only doing it if the NAMED field
    for the subscript isn't zero.

    I also removed two layers of procedure call overhead (passing seven
    arguments, so not trivial) that seemed to be doing nothing.  Probably 
    it used to do something, but no longer does, but if instead it is 
    preparation for some future use, then removing it might be a mistake.
-------------- next part --------------
Index: src/main/subset.c
===================================================================
--- src/main/subset.c	(revision 52822)
+++ src/main/subset.c	(working copy)
@@ -59,73 +59,77 @@
     if (x == R_NilValue)
 	return x;
 
-    for (i = 0; i < n; i++) {
-	ii = INTEGER(indx)[i];
-	if (ii != NA_INTEGER)
-	    ii--;
-	switch (mode) {
-	case LGLSXP:
-	    if (0 <= ii && ii < nx && ii != NA_LOGICAL)
-		LOGICAL(result)[i] = LOGICAL(x)[ii];
-	    else
-		LOGICAL(result)[i] = NA_INTEGER;
-	    break;
-	case INTSXP:
-	    if (0 <= ii && ii < nx && ii != NA_INTEGER)
-		INTEGER(result)[i] = INTEGER(x)[ii];
-	    else
-		INTEGER(result)[i] = NA_INTEGER;
-	    break;
-	case REALSXP:
-	    if (0 <= ii && ii < nx && ii != NA_INTEGER)
-		REAL(result)[i] = REAL(x)[ii];
-	    else
-		REAL(result)[i] = NA_REAL;
-	    break;
-	case CPLXSXP:
-	    if (0 <= ii && ii < nx && ii != NA_INTEGER) {
-		COMPLEX(result)[i] = COMPLEX(x)[ii];
-	    }
-	    else {
-		COMPLEX(result)[i].r = NA_REAL;
-		COMPLEX(result)[i].i = NA_REAL;
-	    }
-	    break;
-	case STRSXP:
-	    if (0 <= ii && ii < nx && ii != NA_INTEGER)
-		SET_STRING_ELT(result, i, STRING_ELT(x, ii));
-	    else
-		SET_STRING_ELT(result, i, NA_STRING);
-	    break;
-	case VECSXP:
-	case EXPRSXP:
-	    if (0 <= ii && ii < nx && ii != NA_INTEGER)
-		SET_VECTOR_ELT(result, i, VECTOR_ELT(x, ii));
-	    else
-		SET_VECTOR_ELT(result, i, R_NilValue);
-	    break;
-	case LISTSXP:
+    switch (mode) {
+    case LGLSXP:
+        for (i = 0; i<n; i++)
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx)
+                LOGICAL(result)[i] = NA_LOGICAL;
+            else
+                LOGICAL(result)[i] = LOGICAL(x)[ii];
+        break;
+    case INTSXP:
+        for (i = 0; i<n; i++)
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx)
+                INTEGER(result)[i] = NA_INTEGER;
+            else
+                INTEGER(result)[i] = INTEGER(x)[ii];
+        break;
+    case REALSXP:
+        for (i = 0; i<n; i++)
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx)
+                REAL(result)[i] = NA_REAL;
+            else
+                REAL(result)[i] = REAL(x)[ii];
+        break;
+    case CPLXSXP:
+        for (i = 0; i<n; i++)
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx) {
+                COMPLEX(result)[i].r = NA_REAL;
+                COMPLEX(result)[i].i = NA_REAL; 
+            }
+            else
+                COMPLEX(result)[i] = COMPLEX(x)[ii];
+        break;
+    case STRSXP:
+        for (i = 0; i<n; i++)
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx)
+                SET_STRING_ELT(result, i, NA_STRING);
+            else
+                SET_STRING_ELT(result, i, STRING_ELT(x, ii));
+        break;
+    case VECSXP:
+    case EXPRSXP:
+        for (i = 0; i<n; i++)
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx)
+                SET_VECTOR_ELT(result, i, R_NilValue);
+            else
+                SET_VECTOR_ELT(result, i, VECTOR_ELT(x, ii));
+        break;
+    case LISTSXP:
 	    /* cannot happen: pairlists are coerced to lists */
-	case LANGSXP:
-	    if (0 <= ii && ii < nx && ii != NA_INTEGER) {
-		tmp2 = nthcdr(x, ii);
-		SETCAR(tmp, CAR(tmp2));
-		SET_TAG(tmp, TAG(tmp2));
-	    }
-	    else
-		SETCAR(tmp, R_NilValue);
-	    tmp = CDR(tmp);
-	    break;
-	case RAWSXP:
-	    if (0 <= ii && ii < nx && ii != NA_INTEGER)
-		RAW(result)[i] = RAW(x)[ii];
-	    else
-		RAW(result)[i] = (Rbyte) 0;
-	    break;
-	default:
-	    errorcall(call, R_MSG_ob_nonsub, type2char(mode));
-	}
+    case LANGSXP:
+        for (i = 0; i<n; i++) {
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx)
+                SETCAR(tmp, R_NilValue);
+            else {
+                tmp2 = nthcdr(x, ii);
+                SETCAR(tmp, CAR(tmp2));
+                SET_TAG(tmp, TAG(tmp2));
+            }
+            tmp = CDR(tmp);
+        }
+        break;
+    case RAWSXP:
+        for (i = 0; i<n; i++)
+            if ((ii=INTEGER(indx)[i]) == NA_INTEGER || --ii < 0 || ii >= nx)
+                RAW(result)[i] = (Rbyte) 0;
+            else
+                RAW(result)[i] = RAW(x)[ii];
+        break;
+    default:
+        errorcall(call, R_MSG_ob_nonsub, type2char(mode));
     }
+
     return result;
 }
 
@@ -210,12 +214,54 @@
     return result;
 }
 
+/* Used in MatrixSubset to set a whole row or column of a matrix to NAs. */
 
+static void set_row_or_col_to_na (SEXP result, int start, int step, int end,
+                                  SEXP call)
+{
+    int i;
+    switch (TYPEOF(result)) {
+    case LGLSXP:
+        for (i = start; i<end; i += step)
+            LOGICAL(result)[i] = NA_LOGICAL;
+        break;
+    case INTSXP:
+        for (i = start; i<end; i += step)
+            INTEGER(result)[i] = NA_INTEGER;
+        break;
+    case REALSXP:
+        for (i = start; i<end; i += step)
+            REAL(result)[i] = NA_REAL;
+        break;
+    case CPLXSXP:
+        for (i = start; i<end; i += step) {
+            COMPLEX(result)[i].r = NA_REAL;
+            COMPLEX(result)[i].i = NA_REAL;
+        }
+        break;
+    case STRSXP:
+        for (i = start; i<end; i += step)
+            SET_STRING_ELT(result, i, NA_STRING);
+        break;
+    case VECSXP:
+        for (i = start; i<end; i += step)
+            SET_VECTOR_ELT(result, i, R_NilValue);
+        break;
+    case RAWSXP:
+        for (i = start; i<end; i += step)
+            RAW(result)[i] = (Rbyte) 0;
+        break;
+    default:
+        errorcall(call, _("matrix subscripting not handled for this type"));
+    }
+}
+
+
 static SEXP MatrixSubset(SEXP x, SEXP s, SEXP call, int drop)
 {
     SEXP attr, result, sr, sc, dim;
     int nr, nc, nrs, ncs;
-    int i, j, ii, jj, ij, iijj;
+    int i, j, ii, jj, ij, iijj, jjnr;
 
     nr = nrows(x);
     nc = ncols(x);
@@ -234,79 +280,82 @@
     PROTECT(sc);
     result = allocVector(TYPEOF(x), nrs*ncs);
     PROTECT(result);
+
+    /* Set rows of result to NAs where there are NA row indexes.  Also check 
+       for bad row indexes (once here rather than many times in loop). */
+
     for (i = 0; i < nrs; i++) {
-	ii = INTEGER(sr)[i];
-	if (ii != NA_INTEGER) {
-	    if (ii < 1 || ii > nr)
-		errorcall(call, R_MSG_subs_o_b);
-	    ii--;
+        ii = INTEGER(sr)[i];
+        if (ii == NA_INTEGER) 
+            set_row_or_col_to_na (result, i, nrs, i+nrs*ncs, call);
+        else if (ii < 1 || ii > nr)
+            errorcall(call, R_MSG_subs_o_b);
+    }
+
+    /* Loop to handle extraction except for NAs.  Outer loop is over columns so
+       writes are sequential, which is faster for indexing, and probably better
+       for memory speed. */
+
+    for (j = 0, ij = 0; j < ncs; j++) {
+        jj = INTEGER(sc)[j];
+
+        /* If column index is NA, just set column of result to NAs. */
+
+        if (jj == NA_INTEGER) {
+            set_row_or_col_to_na (result, j*nrs, 1, (j+1)*nrs, call);
+            ij += nrs;
+            continue;
+        }
+
+        /* Check for bad column index. */
+
+         if (jj < 1 || jj > nc)
+	     errorcall(call, R_MSG_subs_o_b);
+
+        /* Loops over row indexes, except skips NA row indexes, done above. */
+
+        jjnr = (jj-1) * nr;
+        switch (TYPEOF(x)) {
+	case LGLSXP:
+            for (i = 0; i < nrs; i++, ij++) 
+                if ((ii = INTEGER(sr)[i]) != NA_INTEGER) 
+		    LOGICAL(result)[ij] = LOGICAL(x)[(ii-1)+jjnr];
+            break;
+	case INTSXP:
+            for (i = 0; i < nrs; i++, ij++) 
+                if ((ii = INTEGER(sr)[i]) != NA_INTEGER) 
+		    INTEGER(result)[ij] = INTEGER(x)[(ii-1)+jjnr];
+            break;
+	case REALSXP:
+            for (i = 0; i < nrs; i++, ij++) 
+                if ((ii = INTEGER(sr)[i]) != NA_INTEGER) 
+		    REAL(result)[ij] = REAL(x)[(ii-1)+jjnr];
+            break;
+	case CPLXSXP:
+            for (i = 0; i < nrs; i++, ij++) 
+                if ((ii = INTEGER(sr)[i]) != NA_INTEGER) 
+		    COMPLEX(result)[ij] = COMPLEX(x)[(ii-1)+jjnr];
+            break;
+	case STRSXP:
+            for (i = 0; i < nrs; i++, ij++) 
+                if ((ii = INTEGER(sr)[i]) != NA_INTEGER) 
+		    SET_STRING_ELT(result, ij, STRING_ELT(x, (ii-1)+jjnr));
+            break;
+	case VECSXP:
+            for (i = 0; i < nrs; i++, ij++) 
+                if ((ii = INTEGER(sr)[i]) != NA_INTEGER) 
+		    SET_VECTOR_ELT(result, ij, VECTOR_ELT(x, (ii-1)+jjnr));
+            break;
+	case RAWSXP:
+            for (i = 0; i < nrs; i++, ij++) 
+                if ((ii = INTEGER(sr)[i]) != NA_INTEGER) 
+		    RAW(result)[ij] = RAW(x)[(ii-1)+jjnr];
+	    break;
+	default:
+	    errorcall(call, _("matrix subscripting not handled for this type"));
 	}
-	for (j = 0; j < ncs; j++) {
-	    jj = INTEGER(sc)[j];
-	    if (jj != NA_INTEGER) {
-		if (jj < 1 || jj > nc)
-		    errorcall(call, R_MSG_subs_o_b);
-		jj--;
-	    }
-	    ij = i + j * nrs;
-	    if (ii == NA_INTEGER || jj == NA_INTEGER) {
-		switch (TYPEOF(x)) {
-		case LGLSXP:
-		case INTSXP:
-		    INTEGER(result)[ij] = NA_INTEGER;
-		    break;
-		case REALSXP:
-		    REAL(result)[ij] = NA_REAL;
-		    break;
-		case CPLXSXP:
-		    COMPLEX(result)[ij].r = NA_REAL;
-		    COMPLEX(result)[ij].i = NA_REAL;
-		    break;
-		case STRSXP:
-		    SET_STRING_ELT(result, ij, NA_STRING);
-		    break;
-		case VECSXP:
-		    SET_VECTOR_ELT(result, ij, R_NilValue);
-		    break;
-		case RAWSXP:
-		    RAW(result)[ij] = (Rbyte) 0;
-		    break;
-		default:
-		    errorcall(call, _("matrix subscripting not handled for this type"));
-		    break;
-		}
-	    }
-	    else {
-		iijj = ii + jj * nr;
-		switch (TYPEOF(x)) {
-		case LGLSXP:
-		    LOGICAL(result)[ij] = LOGICAL(x)[iijj];
-		    break;
-		case INTSXP:
-		    INTEGER(result)[ij] = INTEGER(x)[iijj];
-		    break;
-		case REALSXP:
-		    REAL(result)[ij] = REAL(x)[iijj];
-		    break;
-		case CPLXSXP:
-		    COMPLEX(result)[ij] = COMPLEX(x)[iijj];
-		    break;
-		case STRSXP:
-		    SET_STRING_ELT(result, ij, STRING_ELT(x, iijj));
-		    break;
-		case VECSXP:
-		    SET_VECTOR_ELT(result, ij, VECTOR_ELT(x, iijj));
-		    break;
-		case RAWSXP:
-		    RAW(result)[ij] = RAW(x)[iijj];
-		    break;
-		default:
-		    errorcall(call, _("matrix subscripting not handled for this type"));
-		    break;
-		}
-	    }
-	}
     }
+
     if(nrs >= 0 && ncs >= 0) {
 	PROTECT(attr = allocVector(INTSXP, 2));
 	INTEGER(attr)[0] = nrs;
-------------- next part --------------
Index: src/include/Defn.h
===================================================================
--- src/include/Defn.h	(revision 52822)
+++ src/include/Defn.h	(working copy)
@@ -1003,7 +1003,7 @@
 void KillAllDevices(void);
 SEXP levelsgets(SEXP, SEXP);
 void mainloop(void);
-SEXP makeSubscript(SEXP, SEXP, int *, SEXP);
+SEXP makeSubscript(SEXP, SEXP, int *, SEXP, int);
 SEXP markKnown(const char *, SEXP);
 SEXP mat2indsub(SEXP, SEXP, SEXP);
 SEXP matchArg(SEXP, SEXP*);
Index: src/main/subassign.c
===================================================================
--- src/main/subassign.c	(revision 52822)
+++ src/main/subassign.c	(working copy)
@@ -448,7 +448,7 @@
     }
 
     stretch = 1;
-    PROTECT(indx = makeSubscript(x, s, &stretch, R_NilValue));
+    PROTECT(indx = makeSubscript(x, s, &stretch, R_NilValue, 1));
     n = length(indx);
     if(length(y) > 1)
 	for(i = 0; i < n; i++)
@@ -665,11 +665,11 @@
     default:
 	warningcall(call, "sub assignment (*[*] <- *) not done; __bug?__");
     }
-    /* Check for additional named elements. */
+    /* Check for additional named elements, if subscripting with strings. */
     /* Note makeSubscript passes the additional names back as the use.names
        attribute (a vector list) of the generated subscript vector */
-    newnames = getAttrib(indx, R_UseNamesSymbol);
-    if (newnames != R_NilValue) {
+    if (TYPEOF(s)==STRSXP && 
+          (newnames = getAttrib(indx, R_UseNamesSymbol)) != R_NilValue) {
 	SEXP oldnames = getAttrib(x, R_NamesSymbol);
 	if (oldnames != R_NilValue) {
 	    for (i = 0; i < n; i++) {
@@ -1195,7 +1195,7 @@
 	error(_("invalid number of subscripts to list assign"));
 
     PROTECT(sub = GetOneIndex(sub, ind));
-    PROTECT(indx = makeSubscript(x, sub, &stretch, R_NilValue));
+    PROTECT(indx = makeSubscript(x, sub, &stretch, R_NilValue, 1));
     	
     n = length(indx);
     if (n > 1)
@@ -1243,7 +1243,7 @@
     vmax = vmaxget();
     nx = length(x);
     PROTECT(s = GetOneIndex(s, ind));
-    PROTECT(s = makeSubscript(x, s, &stretch, R_NilValue));
+    PROTECT(s = makeSubscript(x, s, &stretch, R_NilValue, 1));
     ns = length(s);
     indx = (int*)R_alloc(nx, sizeof(int));
     for (i = 0; i < nx; i++)
Index: src/main/subscript.c
===================================================================
--- src/main/subscript.c	(revision 52822)
+++ src/main/subscript.c	(working copy)
@@ -27,7 +27,6 @@
 
  *  makeSubscript()   -- for "[" and "[<-" in ./subset.c and ./subassign.c,
  *			 and "[[<-" with a scalar in ./subassign.c
- *  vectorSubscript() -- for makeSubscript()   {currently unused externally}
  *  arraySubscript()  -- for "[i,j,..." and "[<-..." in ./subset.c, ./subassign.c
  */
 
@@ -411,7 +410,7 @@
     return s;
 }
 
-static SEXP positiveSubscript(SEXP s, int ns, int nx)
+static SEXP nonnegativeSubscript(SEXP s, int ns, int nx)
 {
     SEXP indx;
     int i, zct = 0;
@@ -433,34 +432,53 @@
 static SEXP integerSubscript(SEXP s, int ns, int nx, int *stretch, SEXP call)
 {
     int i, ii, min, max, canstretch;
-    Rboolean isna = FALSE;
+    Rboolean isna;
+
     canstretch = *stretch;
     *stretch = 0;
-    min = 0;
-    max = 0;
+
     for (i = 0; i < ns; i++) {
 	ii = INTEGER(s)[i];
-	if (ii != NA_INTEGER) {
-	    if (ii < min)
+	if (ii != NA_INTEGER) 
+            break;
+    }
+
+    if (i==ns) /* all NA, or ns==0 */
+        return s;
+
+    isna = i>0;
+
+    min = ii;
+    max = ii;
+    for (i = i+1; i < ns; i++) {
+	ii = INTEGER(s)[i];
+	if (ii == NA_INTEGER) 
+            isna = TRUE;
+        else {
+	    if (ii > max)  /* checked first since more common than ii < min */
+		max = ii;
+	    else if (ii < min)
 		min = ii;
-	    if (ii > max)
-		max = ii;
-	} else isna = TRUE;
+        }
     }
+
     if (max > nx) {
 	if(canstretch) *stretch = max;
 	else {
 	    ECALL(call, _("subscript out of bounds"));
 	}
     }
-    if (min < 0) {
-	if (max == 0 && !isna) return negativeSubscript(s, ns, nx, call);
+
+    if (min > 0) /* All positive (or NA) */
+        return s;
+    else if (min < 0) {
+	if (max <= 0 && !isna) return negativeSubscript(s, ns, nx, call);
 	else {
 	    ECALL(call, _("only 0's may be mixed with negative subscripts"));
 	}
     }
-    else return positiveSubscript(s, ns, nx);
-    return R_NilValue;
+    else /* min == 0 */
+        return nonnegativeSubscript(s, ns, nx);
 }
 
 typedef SEXP (*StringEltGetter)(SEXP x, int i);
@@ -482,13 +500,14 @@
 
 static SEXP
 stringSubscript(SEXP s, int ns, int nx, SEXP names,
-		StringEltGetter strg, int *stretch, Rboolean in, SEXP call)
+		StringEltGetter strg, int *stretch, SEXP call)
 {
     SEXP indx, indexnames;
     int i, j, nnames, sub, extra;
     int canstretch = *stretch;
     /* product may overflow, so check factors as well. */
-    Rboolean usehashing = in && ( ((ns > 1000 && nx) || (nx > 1000 && ns)) || (ns * nx > 15*nx + ns) );
+    Rboolean usehashing = 
+      (ns > 1000 && nx) || (nx > 1000 && ns) || (ns * nx > 15*nx + ns);
 
     PROTECT(s);
     PROTECT(names);
@@ -520,9 +539,11 @@
 	    if (names != R_NilValue) {
 		for (j = 0; j < nnames; j++) {
 		    SEXP names_j = strg(names, j);
+#if 0 /* Disabled now that the "in" argument is gone; was always TRUE. */
 		    if (!in && TYPEOF(names_j) != CHARSXP) {
 			ECALL(call, _("character vector element does not have type CHARSXP"));
 		    }
+#endif
 		    if (NonNullStringMatch(STRING_ELT(s, i), names_j)) {
 			sub = j + 1;
 			SET_VECTOR_ELT(indexnames, i, R_NilValue);
@@ -577,7 +598,7 @@
 
 static SEXP
 int_arraySubscript(int dim, SEXP s, SEXP dims, AttrGetter dng,
-		   StringEltGetter strg, SEXP x, Rboolean in, SEXP call)
+                   StringEltGetter strg, SEXP x, SEXP call)
 {
     int nd, ns, stretch = 0;
     SEXP dnames, tmp;
@@ -602,7 +623,7 @@
 	    ECALL(call, _("no 'dimnames' attribute for array"));
 	}
 	dnames = VECTOR_ELT(dnames, dim);
-	return stringSubscript(s, ns, nd, dnames, strg, &stretch, in, call);
+	return stringSubscript(s, ns, nd, dnames, strg, &stretch, call);
     case SYMSXP:
 	if (s == R_MissingArg)
 	    return nullSubscript(nd);
@@ -622,61 +643,75 @@
 arraySubscript(int dim, SEXP s, SEXP dims, AttrGetter dng,
 	       StringEltGetter strg, SEXP x)
 {
-    return int_arraySubscript(dim, s, dims, dng, strg, x, TRUE, R_NilValue);
+    return int_arraySubscript(dim, s, dims, dng, strg, x, R_NilValue);
 }
 
-/* Subscript creation.  The first thing we do is check to see */
-/* if there are any user supplied NULL's, these result in */
-/* returning a vector of length 0. */
-/* if stretch is zero on entry then the vector x cannot be
-   "stretched",
-   otherwise, stretch returns the new required length for x
+/* Subscript creation.  
+   x is the object being subscripted; s is the R subscript value.
+   If stretch is zero on entry then the vector x cannot be "stretched",
+   otherwise, stretch returns the new required length for x.
+   The used_to_replace arguement is 1 if the subscript is used to replace 
+   items (hence subscript may need to be duplicated in case it itself 
+   would be modified), and 0 if the subscript is only for extracting items.
 */
 
-SEXP attribute_hidden makeSubscript(SEXP x, SEXP s, int *stretch, SEXP call)
+SEXP attribute_hidden makeSubscript(SEXP x, SEXP s, int *stretch, SEXP call, 
+                                    int used_to_replace)
 {
-    int nx;
-    SEXP ans;
+    int nx, ns;
+    SEXP ans, tmp;
 
-    ans = R_NilValue;
-    if (isVector(x) || isList(x) || isLanguage(x)) {
-	nx = length(x);
-
-	ans = vectorSubscript(nx, s, stretch, getAttrib, (STRING_ELT),
-			      x, call);
-    }
-    else {
+    if (!isVector(x) && !isList(x) && !isLanguage(x)) {
 	ECALL(call, _("subscripting on non-vector"));
+        return R_NilValue;
     }
-    return ans;
 
-}
+    nx = length(x);
+    ns = length(s);
 
-/* nx is the length of the object being subscripted,
-   s is the R subscript value,
-   dng gets a given attrib for x, which is the object we are
-   subsetting,
-*/
+#if 0 /* In r52822, checked for no attributes of s, but only potential */
+      /* problem seems to be an R_UseNamesSymbol attribute, which can  */
+      /* be ignored if the subscripts aren't strings.                  */
+    /* special case for simple indices -- does not duplicate */
+    if (ns == 1 && ATTRIB(s) == R_NilValue) 
+#else
+    /* Handle single positive index (real or integer), not out of bounds.  */
+    /* Note that we don't have to worry about a length one subscript being */
+    /* modified in a replace operation, since even if it is,  we don't use */
+    /* it anymore after the modification.                                  */
+    if (ns == 1) 
+#endif
+        if (TYPEOF(s) == INTSXP) {
+            int i = INTEGER(s)[0];
+            if (0 < i && i <= nx) {
+                *stretch = 0;
+                return s;
+            }
+	} else if (TYPEOF(s) == REALSXP) {
+            int i, warn = 0;
+            i = IntegerFromReal (REAL(s)[0], &warn);
+            if (0 < i && i <= nx) {
+                if (warn) CoercionWarning(warn);
+                *stretch = 0;
+                return ScalarInteger(i);
+            }
+        }
 
-static SEXP
-int_vectorSubscript(int nx, SEXP s, int *stretch, AttrGetter dng,
-		    StringEltGetter strg, SEXP x, Rboolean in, SEXP call)
-{
-    int ns;
-    SEXP ans = R_NilValue, tmp;
-
-    ns = length(s);
-    /* special case for simple indices -- does not duplicate */
-    if (ns == 1 && TYPEOF(s) == INTSXP && ATTRIB(s) == R_NilValue) {
-	int i = INTEGER(s)[0];
-	if (0 < i && i <= nx) {
-	    *stretch = 0;
-	    return s;
-	}
-    }
+#if 0 /* Duplicated in r52822, but seems unnecessary as long as callers ignore
+         any R_UseNamesSymbol attribute when not subscripting with strings,
+         and the subscript will not be modified in a replacement operation
+         that it is supplying the subscripts for. */
     PROTECT(s = duplicate(s));
     SET_ATTRIB(s, R_NilValue);
     SET_OBJECT(s, 0);
+#else
+    /* Duplicate if the subscript might be being used to replace elements of
+       itself. */
+    if (used_to_replace && NAMED(s) > 0) 
+        s = duplicate(s);
+    PROTECT(s);
+#endif
+
     switch (TYPEOF(s)) {
     case NILSXP:
 	*stretch = 0;
@@ -696,9 +731,9 @@
 	break;
     case STRSXP:
     {
-	SEXP names = dng(x, R_NamesSymbol);
+	SEXP names = getAttrib(x, R_NamesSymbol);
 	/* *stretch = 0; */
-	ans = stringSubscript(s, ns, nx, names, strg, stretch, in, call);
+	ans = stringSubscript(s, ns, nx, names, (STRING_ELT), stretch, call);
     }
     break;
     case SYMSXP:
@@ -717,11 +752,3 @@
     UNPROTECT(1);
     return ans;
 }
-
-
-SEXP attribute_hidden
-vectorSubscript(int nx, SEXP s, int *stretch, AttrGetter dng,
-		StringEltGetter strg, SEXP x, SEXP call)
-{
-    return int_vectorSubscript(nx, s, stretch, dng, strg, x, TRUE, call);
-}
Index: src/main/subset.c
===================================================================
--- src/main/subset.c	(revision 52822)
+++ src/main/subset.c	(working copy)
@@ -161,7 +161,7 @@
     /* Convert to a vector of integer subscripts */
     /* in the range 1:length(x). */
 
-    PROTECT(indx = makeSubscript(x, s, &stretch, call));
+    PROTECT(indx = makeSubscript(x, s, &stretch, call, 0));
     n = LENGTH(indx);
 
     /* Allocate the result. */


More information about the R-devel mailing list