[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

Henrik Bengtsson henrik.bengtsson at ucsf.edu
Wed May 6 18:04:21 CEST 2015


On Wed, May 6, 2015 at 1:33 AM, Martin Maechler
<maechler at lynne.stat.math.ethz.ch> wrote:
>>>>>> John Chambers <jmc at stat.stanford.edu>
>>>>>>     on Tue, 5 May 2015 08:39:46 -0700 writes:
>
>     > When someone suggests that we "might have had a reason" for some peculiarity in the original S, my usual reaction is "Or else we never thought of the problem".
>     > In this case, however, there is a relevant statement in the 1988 "blue book".  In the discussion of subscripting (p 358) the definition for negative i says: "the indices consist of the elements of seq(along=x) that do not match any elements in -i".
>
>     > Suggesting that no bounds checking on -i takes place.
>
>     > John
>
> Indeed!
> Thanks a lot John, for the perspective and clarification!
>
> I'm committing a patch to the documentation now.

Thank you both and also credits to Dongcan Jiang for pointing out to
me that errors were indeed not generated in this case.

I agree with the decision. It's interesting to notice that now the
only way an error is generated is when index-vector subsetting is done
using mixed positive and negative indices, e.g. x[c(-1,1)].

/Henrik

> Martin
>
>
>     > On May 5, 2015, at 7:01 AM, Martin Maechler <maechler at lynne.stat.math.ethz.ch> wrote:
>
>     >>>>>>> Henrik Bengtsson <henrik.bengtsson at ucsf.edu>
>     >>>>>>> on Mon, 4 May 2015 12:20:44 -0700 writes:
>     >>
>     >>> In Section 'Indexing by vectors' of 'R Language Definition'
>     >>> (http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
>     >>> it says:
>     >>
>     >>> "Integer. All elements of i must have the same sign. If they are
>     >>> positive, the elements of x with those index numbers are selected. If
>     >>> i contains negative elements, all elements except those indicated are
>     >>> selected.
>     >>
>     >>> If i is positive and exceeds length(x) then the corresponding
>     >>> selection is NA. A negative out of bounds value for i causes an error.
>     >>
>     >>> A special case is the zero index, which has null effects: x[0] is an
>     >>> empty vector and otherwise including zeros among positive or negative
>     >>> indices has the same effect as if they were omitted."
>     >>
>     >>> However, that "A negative out of bounds value for i causes an error"
>     >>> in the second paragraph does not seem to apply.  Instead, R silently
>     >>> ignore negative indices that are out of range.  For example:
>     >>
>     >>>> x <- 1:4
>     >>>> x[-9L]
>     >>> [1] 1 2 3 4
>     >>>> x[-c(1:9)]
>     >>> integer(0)
>     >>>> x[-c(3:9)]
>     >>> [1] 1 2
>     >>
>     >>>> y <- as.list(1:4)
>     >>>> y[-c(1:9)]
>     >>> list()
>     >>
>     >>> Is the observed non-error the correct behavior and therefore the
>     >>> documentation is incorrect, or is it vice verse?  (...or is it me
>     >>> missing something)
>     >>
>     >>> I get the above on R devel, R 3.2.0, and as far back as R 2.11.0
>     >>> (haven't check earlier versions).
>     >>
>     >> Thank you, Henrik!
>     >>
>     >> I've checked further back: The change happened between R 2.5.1 and R 2.6.0.
>     >>
>     >> The previous behavior was
>     >>
>     >>> (1:3)[-(3:5)]
>     >> Error: subscript out of bounds
>     >>
>     >> If you start reading NEWS.2, you see a *lot* of new features
>     >> (and bug fixes) in the 2.6.0 news, but from my browsing, none of
>     >> them mentioned the new behavior as feature.
>     >>
>     >> Let's -- for a moment -- declare it a bug in the code, i.e., not
>     >> in the documentation:
>     >>
>     >> - As 2.6.0  happened quite a while ago (Oct. 2007),
>     >> we could wonder how much R code will break if we fix the bug.
>     >>
>     >> - Is the R package authors' community willing to do the necessary
>     >> cleanup in their packages ?
>     >>
>     >> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
>     >>
>     >>
>     >> Now, after reading the source code for a while, and looking at
>     >> the changes, I've found the log entry
>     >>
>     >> ------------------------------------------------------------------------
>     >> r42123 | ihaka | 2007-07-05 02:00:05 +0200 (Thu, 05 Jul 2007) | 4 lines
>     >>
>     >> Changed the behaviour of out-of-bounds negative
>     >> subscripts to match that of S.  Such values are
>     >> now ignored rather than tripping an error.
>     >>
>     >> ------------------------------------------------------------------------
>     >>
>     >> So, it was changed on purpose, by one of the true "R"s, very
>     >> much on purpose.
>     >>
>     >> Making it a *warning* instead of the original error
>     >> may have been both more cautious and more helpful for
>     >> detecting programming errors.
>     >>
>     >> OTOH, John Chambers, the father of S and hence grandfather of R,
>     >> may have had good reasons why it seemed more logical to silently
>     >> ignore such out of bound negative indices:
>     >> One could argue that
>     >>
>     >> x[-5]  means  "leave away the 5-th element of x"
>     >>
>     >> and if there is no 5-th element of x, leaving it away should be a no-op.
>     >>
>     >> After all this musing and history detection, my gut decision
>     >> would be to only change the documentation which Ross forgot to change.
>     >>
>     >> But of course, it may be interesting to hear other programmeR's feedback on this.
>     >>
>     >> Martin
>



More information about the R-devel mailing list