[R] Coding style question

Tue Feb 17 19:42:19 CET 2015

On 17/02/2015 11:19 AM, John Posner wrote:
> In the course of slicing-and-dicing some data, I had occasion to create a list like this:
>
> list(
>      subset(my_dataframe, GR1=="XX1"),
>      subset(my_dataframe, GR1=="XX2"),
>      subset(my_dataframe, GR1=="YY"),
>      subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
>      subset(my_dataframe, GR2=="Remission"),
>      subset(my_dataframe, GR2=="Relapse"))
>
> I used %in% only once, because there was only one "compound value" (XX1 or XX2) for subsetting. But then it occurred to me to use %in% everywhere, taking advantage of the fact that a scalar value is the same as a length-1 vector:
>
> list(
>      subset(my_dataframe, GR1 %in% "XX1"),
>      subset(my_dataframe, GR1 %in% "XX2"),
>      subset(my_dataframe, GR1 %in% "YY"),
>      subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
>      subset(my_dataframe, GR2 %in% "Remission"),
>      subset(my_dataframe, GR2 %in% "Relapse"))
>
> It works just fine.  Are there any problems with this style, from the standpoints of correctness, aesthetics, etc.?

If GR1 or GR2 has a missing value, you get NA from the equality tests, 
but FALSE from the %in% tests.  That won't affect subset (where NA and 
FALSE both result in the omission of the observation), but it might 
affect other code like this.  For example, if you had selected rows 
using a logical index instead of using subset, the NA entries in the 
index would result in NA selections in the data.

Duncan Murdoch