[R] any and all

Sat Apr 13 09:17:11 CEST 2024

Hi Avi,

As Dénes Tóth has rightly diagnosed, you are building an "all or 
nothing" filter. However, you do not need to explicitly spell out all 
columns that you want to filter for; the "tidy" way would be to use a 
helper function like `if_all()` or `if_any()`. Consider this example (I 
hope I understand your intentions correctly):

```

library(dplyr)

data <- tribble(
   ~first.a, ~first.b, ~first.c,
   1L,        1L,       0L,
   NA,       1L,       0L,
   1L,        0L,       NA,
   NA,       NA,       1L
)

```

Let's say we only want to keep rows that have a non-missing value for 
either `first.a` or `first.b` (or hypothetical later generations like 
`second.a` and `second.b` etc.):

```

data |>
   filter(if_any(ends_with(c(".a", ".b")), \(x) !is.na(x)))

```

So: `filter()` (keep observations) `if_any` of the columns ending with 
.a or .b is not `NA` (we have to wrap `!is.na` into an anonymous 
function for it to be a valid argument type). This would yield

```

# A tibble: 3 × 3
   first.a first.b first.c
     <int>   <int>   <int>
1       1       1       0
2      NA       1       0
3       1       0      NA

```

Discarding only the row where both of them are missing. Another way of 
writing this would be

```

data |>
   filter(!if_all(ends_with(c(".a", ".b")), is.na))

```

i.e. don't keep rows where all columns ending in .a or .b are `NA`, 
which returns the same result. Hope this helps,

Lennart Kasserra

Am 12.04.24 um 21:52 schrieb avi.e.gross using gmail.com:
> Base R has generic functions called any() and all() that I am having trouble
> using.
>   
> It works fine when I play with it in a base R context as in:
>   
>> all(any(TRUE, TRUE), any(TRUE, FALSE))
> [1] TRUE
>> all(any(TRUE, TRUE), any(FALSE, FALSE))
> [1] FALSE
>   
> But in a tidyverse/dplyr environment, it returns wrong answers.
>   
> Consider this example. I have data I have joined together with pairs of
> columns representing a first generation and several other pairs representing
> additional generations. I want to consider any pair where at least one of
> the pair is not NA as a success. But in order to keep the entire row, I want
> all three pairs to have some valid data. This seems like a fairly common
> reasonable thing often needed when evaluating data.
>   
> So to make it very general, I chose to do something a bit like this:
>   
> result <- filter(mydata,
>                   all(
>                     any(!is.na(first.a), !is.na(first.b)),
>                     any(!is.na(second.a), !is.na(second.b)),
>                     any(!is.na(third.a), !is.na(third.b))))
>   
> I apologize if the formatting is not seen properly. The above logically
> should work. And it should be extendable to scenarios where you want at
> least one of M columns to contain data as a group with N such groups of any
> size.
>   
> But since it did not work, I tried a plan that did work and feels silly. I
> used mutate() to make new columns such as:
>   
> result <-
>    mydata |>
>    mutate(
>      usable.1 = (!is.na(first.a) | !is.na(first.b)),
>      usable.2 = (!is.na(second.a) | !is.na(second.b)),
>      usable.3 = (!is.na(third.a) | !is.na(third.b)),
>      usable = (usable.1 & usable.2 & usable.3)
>    ) |>
>    filter(usable == TRUE)
>   
> The above wastes time and effort making new columns so I can check the
> calculations then uses the combined columns to make a Boolean that can be
> used to filter the result.
>   
> I know this is not the place to discuss dplyr. I want to check first if I am
> doing anything wrong in how I use any/all. One guess is that the generic is
> messed with by dplyr or other packages I libraried.
>   
> And, of course, some aspects of delayed evaluation can interfere in subtle
> ways.
>   
> I note I have had other problems with these base R functions before and
> generally solved them by not using them, as shown above. I would much rather
> use them, or something similar.
>   
>   
> Avi
>   
>   
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.