[R] any and all

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Fri Apr 12 21:52:28 CEST 2024


Base R has generic functions called any() and all() that I am having trouble
using.
 
It works fine when I play with it in a base R context as in:
 
> all(any(TRUE, TRUE), any(TRUE, FALSE))
[1] TRUE
> all(any(TRUE, TRUE), any(FALSE, FALSE))
[1] FALSE
 
But in a tidyverse/dplyr environment, it returns wrong answers.
 
Consider this example. I have data I have joined together with pairs of
columns representing a first generation and several other pairs representing
additional generations. I want to consider any pair where at least one of
the pair is not NA as a success. But in order to keep the entire row, I want
all three pairs to have some valid data. This seems like a fairly common
reasonable thing often needed when evaluating data.
 
So to make it very general, I chose to do something a bit like this:
 
result <- filter(mydata,
                 all(
                   any(!is.na(first.a), !is.na(first.b)),
                   any(!is.na(second.a), !is.na(second.b)),
                   any(!is.na(third.a), !is.na(third.b))))
 
I apologize if the formatting is not seen properly. The above logically
should work. And it should be extendable to scenarios where you want at
least one of M columns to contain data as a group with N such groups of any
size.
 
But since it did not work, I tried a plan that did work and feels silly. I
used mutate() to make new columns such as:
 
result <-
  mydata |>
  mutate(
    usable.1 = (!is.na(first.a) | !is.na(first.b)),
    usable.2 = (!is.na(second.a) | !is.na(second.b)),
    usable.3 = (!is.na(third.a) | !is.na(third.b)),
    usable = (usable.1 & usable.2 & usable.3)
  ) |>
  filter(usable == TRUE)
 
The above wastes time and effort making new columns so I can check the
calculations then uses the combined columns to make a Boolean that can be
used to filter the result.
 
I know this is not the place to discuss dplyr. I want to check first if I am
doing anything wrong in how I use any/all. One guess is that the generic is
messed with by dplyr or other packages I libraried.
 
And, of course, some aspects of delayed evaluation can interfere in subtle
ways.
 
I note I have had other problems with these base R functions before and
generally solved them by not using them, as shown above. I would much rather
use them, or something similar.
 
 
Avi
 
 

	[[alternative HTML version deleted]]



More information about the R-help mailing list