[R] Problem Subsetting Rows that Have NA's

peter dalgaard pdalgd at gmail.com
Wed Oct 25 22:02:36 CEST 2017


It's not a bug, and the rationale has been hashed over since the beginning of time...

It is a bit of an annoyance in some contexts and part of the rationale for the existence of subset().

If you need an explanation, start with elementary vector indexing:

colors <- c("red", "green", "blue")
colors[c(1,3,2,NA,3)]

You pretty clearly want the result to be a vector of length 5 with 4th element NA, right?

Same story if you index into a data frame: 

> airquality[c(1,3,2,NA,2),]
    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
3      12     149 12.6   74     5   3
2      36     118  8.0   72     5   2
NA     NA      NA   NA   NA    NA  NA
2.1    36     118  8.0   72     5   2

Now, that's not an argument that you also get NA rows from logical indexing, but then comes the issue of automatic coercion: In colors[NA], the NA is actually mode "logical". If we removed NA indexes in logical indexing, we would have to explain why colors[c(1,NA)] has length 2 but colors[NA] has length zero (which it currently does not). 

-pd

> On 25 Oct 2017, at 15:57 , BooBoo <booboo at gforcecable.com> wrote:
> 
> On 10/25/2017 4:38 AM, Ista Zahn wrote:
>> On Tue, Oct 24, 2017 at 3:05 PM, BooBoo <booboo at gforcecable.com> wrote:
>>> This has every appearance of being a bug. If it is not a bug, can someone
>>> tell me what I am asking for when I ask for "x[x[,2]==0,]". Thanks.
>> You are asking for elements of x where the second column is equal to zero.
>> 
>> help("==")
>> 
>> and
>> 
>> help("[")
>> 
>> explain what happens when missing values are involved. I agree that
>> the behavior is surprising, but your first instinct when you discover
>> something surprising should be to read the documentation, not to post
>> to this list. After having read the documentation you may post back
>> here if anything remains unclear.
>> 
>> Best,
>> Ista
>> 
>>>> #here is the toy dataset
>>>> x <- rbind(c(1,1),c(2,2),c(3,3),c(4,0),c(5,0),c(6,NA),
>>> +   c(7,NA),c(8,NA),c(9,NA),c(10,NA)
>>> + )
>>>> x
>>>       [,1] [,2]
>>>  [1,]    1    1
>>>  [2,]    2    2
>>>  [3,]    3    3
>>>  [4,]    4    0
>>>  [5,]    5    0
>>>  [6,]    6   NA
>>>  [7,]    7   NA
>>>  [8,]    8   NA
>>>  [9,]    9   NA
>>> [10,]   10   NA
>>>> #it contains rows that have NA's
>>>> x[is.na(x[,2]),]
>>>      [,1] [,2]
>>> [1,]    6   NA
>>> [2,]    7   NA
>>> [3,]    8   NA
>>> [4,]    9   NA
>>> [5,]   10   NA
>>>> #seems like an unreasonable answer to a reasonable question
>>>> x[x[,2]==0,]
>>>      [,1] [,2]
>>> [1,]    4    0
>>> [2,]    5    0
>>> [3,]   NA   NA
>>> [4,]   NA   NA
>>> [5,]   NA   NA
>>> [6,]   NA   NA
>>> [7,]   NA   NA
>>>> #this is more what I was expecting
>>>> x[which(x[,2]==0),]
>>>      [,1] [,2]
>>> [1,]    4    0
>>> [2,]    5    0
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> I wanted to know if this was a bug so that I could report it if so. You say it is not, so you answered my question. As far as me not reading the documentation, I challenge anyone to read the cited help pages and predict the observed behavior based on the information given in those pages.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list