[R] What is the intended behavior, when subsetting using brackets [ ], when the subset criterion has NA's?

Marc Schwartz m@rc_@chw@rtz @end|ng |rom me@com
Wed Apr 6 23:11:07 CEST 2022


Hi,

The behavior is as intended.

Note that your subset criteria results in:

> my_subset_criteria == T
[1] FALSE FALSE  TRUE    NA    NA

If you review ?subset, you will see in the Details section:

"For ordinary vectors, the result is simply x[subset & !is.na(subset)]."

while reviewing ?"[", you will see in the section titled "NAs in indexing":

"When extracting, a numerical, logical or character NA index picks an unknown element and so returns NA in the corresponding element of a logical, integer, numeric, complex or character result, and NULL for a list. (It returns 00 for a raw result.)"

So, in the first case using subset(), NAs are explicitly excluded in the result, while not the case by default using bracket based subsetting.

In essence, to replicate the behavior of subset() using brackets:

> my_data[(my_subset_criteria == T) & !is.na(my_subset_criteria == T)]
[1] 3


Lastly, the use of 'T' as a single character representation of the boolean TRUE, is generally recommended against. While T and F are set as TRUE and FALSE at the start of a new R session, there is no guarantee that they will stay that way, as they can both be re-assigned:

> T <- "This is not TRUE"
> T
[1] "This is not TRUE"

whereas TRUE cannot be:

> TRUE <- "This is not TRUE"
Error in TRUE <- "This is not TRUE" : 
  invalid (do_set) left-hand side to assignment

Regards,

Marc Schwartz


On April 6, 2022 at 4:13:01 PM, Kelly Thompson (kt1572757 using gmail.com (mailto:kt1572757 using gmail.com)) wrote:

> I noticed that I get different results when subsetting using subset,
> compared to subsetting using "brackets" when the subset criteria have
> NA's.
>
> Here's an example
>
> #START OF EXAMPLE
> my_data <- 1:5
> my_data
>
> my_subset_criteria <- c( F, F, T, NA, NA)
> my_subset_criteria
>
> #subsetting using subset returns the data where my_subset_criteria equals TRUE
> my_data[my_subset_criteria == T]
>
> #subsetting using brackets returns the data where my_subset_criteria
> equals TRUE, and also NA where my_subset_criteria is NA
> subset(my_data, my_subset_criteria == T)
>
> #END OF EXAMPLE
>
> This behavior is also mentioned here
> https://statisticaloddsandends.wordpress.com/2018/10/07/subsetting-in-the-presence-of-nas/
>
> Q. Is this the intended behavior when subsetting with brackets?
>
> Thank you!
>



More information about the R-help mailing list