[Rd] Surprising behavior of letters[c(NA, NA)]

Fri Dec 17 16:55:44 CET 2010

On 17/12/2010 10:40 AM, (Ted Harding) wrote:
> On 17-Dec-10 14:32:18, Gabor Grothendieck wrote:
> >  Consider this:
> >
> >>  letters[c(2, 3)]
> >  [1] "b" "c"
> >>  letters[c(2, NA)]
> >  [1] "b" NA
> >>  letters[c(NA, 3)]
> >  [1] NA  "c"
> >>  letters[c(NA, NA)]
> >   [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> >  NA NA NA
> >  [26] NA
> >
> >  The result is a 2-vector in each case until we get to c(NA, NA) and
> >  then it unexpectedly changes from returning a 2-vector to returning a
> >  26-vector.  I think most people would have expected that the answer
> >  would be c(NA, NA).
>
> I'm not sure that it is suprising! Consider
>    letters[NA]
> which returns exactly the same result. Then consider that 'letters' is
> simply a 26-element character vector c("a",...). Now consider
>
>    x<- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
>    x[NA]
>    # [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
>
> In other words, x[NA] for any vector x will test each index 1:length(x)
> against NA, and will find that it's NA, since it doesn't know whether
> the index matches or not. Therefore it returns NA for that index, and
> will do the same for every index. So it's telling you: "For each of my
> elements a,b,c,d,e,f,... I have to tell you that I don't know whether
> you want it or not". You also get similar behavior for x==NA.
>
> If anything might be surprising (though that also admits a logical
> explanation), is the result
>
>    letters[c(2, NA)]
>    # [1] "b" NA
>
> since the result being asked for by the first element of c(2,NA) is
> definite -- so far so good -- but then you would expect it to have the
> same problem with what is being asked for by NA. This time, it seems
> that because the 2-element vector c(2,NA) is being submitted, its
> length over-rides the length of the response that would be given for
> x[NA]: "You asked for a 2-element extraction from letters; I can see
> what you want for the first, but not for the second".
>
> However, that logic does not work for letters[c(NA,NA)] which still
> returns the 26-element result!
>
> After all that, I'm inclined to the view that letters[NA] should
> return one element (NA), letters[c(NA,NA)] should return 2 (NA,NA),
> etc.; and that the same should apply to all vectors accessed by [].
> The above behaviour seems to contradict [what I can understand from]
> what is said in ?"[":
>
> NAs in indexing:
>       When extracting, a numerical, logical or character 'NA' index
>       picks an unknown element and so returns 'NA' in the corresponding
>       element of a logical, integer, numeric, complex or character
>       result, and 'NULL' for a list.  (It returns '00' for a raw
>       result.]
>
> since that seems to imply that x[c(NA,NA)] should return c(NA,NA)
> and not rep(NA,length(x))!

I don't know where that quote came from, but it is not quite relevant 
here.  The relevant quote is in the Language Definition, talking about 
indices by type of index:

"Logical. The indexing i should generally have the same length as x. If 
it is shorter, then
its elements will be recycled as discussed in Section 3.3 [Elementary 
arithmetic operations],
page 14. If it is longer, then x is conceptually extended with NAs. The 
selected values of x
are those for which i is TRUE."

The Introduction to R gets this wrong:

"A logical vector. In this case the index vector must be of the same 
length as the vector
from which elements are to be selected. Values corresponding to TRUE in 
the index vector
are selected and those corresponding to FALSE are omitted."

The "must" in that quote is too strong; the Language Definition gets it 
right.  Perhaps the behaviour described in the Intro manual would be 
less confusing:  letters[c(NA,NA)] would give an error or warning, 
something like "logical index of incorrect length".  But I suspect 
people rely on the recycling of logical vectors, so there'd be a lot of 
complaints if we made that change.

Duncan Murdoch