[R] NA and logical indexes

(Ted Harding) Ted.Harding at manchester.ac.uk
Fri Nov 28 23:01:15 CET 2008


On 28-Nov-08 21:25:36, Sebastian P. Luque wrote:
> Hi,
> I vaguely remember this issue being discussed at some length in the
> past, but am having trouble relocating the proper thread (defining an
> adequate search string to do so):
> 
> ---<---------------cut here---------------start-------------->---
> R> foo <- data.frame(A=gl(2, 5, labels=letters[1:2]), X=runif(10))
> R> foo$A[1] <- NA
> R> foo$A == "b"
>  [1]    NA FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
> R> foo$A[foo$A == "b"]
> [1] <NA> b    b    b    b    b   
> Levels: a b
> R> foo$X[foo$A == "b"]
> [1]     NA 0.4425 0.7164 0.3171 0.1967 0.8300
> R> foo[foo$A == "b", ]
>       A      X
> NA <NA>     NA
> 6     b 0.4425
> 7     b 0.7164
> 8     b 0.3171
> 9     b 0.1967
> 10    b 0.8300
> ---<---------------cut here---------------end---------------->---
> 
> Why is foo$X[1] set to NA in that last call?
> 
> Cheers,
> Seb

It is not! In my repetition (which has different runifs):

  foo[foo$A == "b", ]
#       A         X
# NA <NA>        NA
# 6     b 0.2300618
# 7     b 0.5109791
# 8     b 0.7947862
# 9     b 0.3400228
# 10    b 0.5464989
  foo
#       A         X
# 1  <NA> 0.5013591
# 2     a 0.4475963
# 3     a 0.2600449
# 4     a 0.9240698
# 5     a 0.4205284
# 6     b 0.2300618
# 7     b 0.5109791
# 8     b 0.7947862
# 9     b 0.3400228
# 10    b 0.5464989

NA can seem to have a bewildering logic, but it all becomes
clear if you interpret NA as "value unkown".

You asked for foo[foo$A == "b", ]. What happens is that
when the test foo$A == "b" encounters f$A[1] it sees NA,
so it does not know what the value is. Hence it does not
know whether this row of foo satisfies the test. Hence
the entire row is of unkown status. Hence a row is output
all of whose elements (including the row label, i.e. the
row number) are flagged "unknown", i.e. NA.

AFter all, if it gave the value of foo$X[1] = 0.5013591,
and you subsequently acessed foo[foo$A == "b",][1,2] and got
0.5013591, you would presumably proceed as though this was
a value corresponding to a case where foo$A == "b". But it
is not -- since foo$A[1] = NA, you don't know whether that is
the case. Hence you don't know the value of foo[foo$A == "b",][1,2].

Clear? ( :))
Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Nov-08                                       Time: 22:01:11
------------------------------ XFMail ------------------------------



More information about the R-help mailing list