[R] any and all

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Sat Apr 13 04:24:30 CEST 2024


Thanks everyone and any/all reading this. I think I got my answer. And, no, I suspected I did not need to provide a very specific example, at least not yet.

The answer is that my experiment was not vectorized while using dplyr verbs like mutate do their work implicitly in a vectorized way. 

This is in some ways similar to the difference between using an if/else type of statement or using the ifelse() function in base R that works on all elements of a vector at once. Some changes to R have been looking at not allowing a vector of length greater than 1 to be used in contexts where formerly only the first element was read and used and the rest ignored.

Dénes asked some other questions about dplyr that I can reply to in private (and if he wishes in Hungarian or other languages we share) as this forum is mainly focused on base R and not on various packages and apparently especially not on the tidyverse that some see as being closely related to a company. Speaking for myself, I see no reason to be wedded to base R and use what I like.

Thanks again. I knew it was simple. And, if anyone cares, I can now look more carefully for functions that do what any/all do but are vectorized because that is basically what I did in my example code where I primitively created new columns in vectorized fashion to impact all rows "at once" as that is one major style of doing things in R. 

Having said that, it is indeed an issue to be cautious with in R as sometimes vectors being used may not be the same length and may even be automatically extended to be so. I also often program in Python and we had a discussion there of what exactly some modules should do if given multiple vectors (or lists or other data structures including generators) and zip the results into tuples when one or another runs out first.

I note that using | versus || and similarly & and && often messes up programs if used wrong. A vectorized any/all and other such verbs as at_least_n() can be very useful but only when used carefully.



-----Original Message-----
From: Dénes Tóth <toth.denes using kogentum.hu> 
Sent: Friday, April 12, 2024 6:43 PM
To: Duncan Murdoch <murdoch.duncan using gmail.com>; avi.e.gross using gmail.com; r-help using r-project.org
Subject: Re: [R] any and all

Hi Avi,

As Duncan already mentioned, a reproducible example would be helpful to 
assist you better. Having said that, I think you misunderstand how 
`dplyr::filter` works: it performs row-wise filtering, so the filtering 
expression shall return a logical vector of the same length as the 
data.frame, or must be a single boolean value meaning "keep all" (TRUE) 
or "drop all" (FALSE). If you use `any()` or `all()`, they return a 
single boolean value, so you have an all-or-nothing filter in the end, 
which is probably not what you want.

Note also that you do not need to use `mutate` to use `filter` (read 
?dpylr::filter carefully):
```
filter(
   .data = mydata,
   !is.na(first.a) | !is.na(first.b),
   !is.na(second.a) | !is.na(second.b),
   !is.na(third.a) | !is.na(third.b)
)
```

Or you can use `base::subset()`:
```
subset(
   mydata,
   (!is.na(first.a) | !is.na(first.b))
   & (!is.na(second.a) | !is.na(second.b))
   & (!is.na(third.a) | !is.na(third.b))
)
```

Regards,
Denes

On 4/12/24 23:59, Duncan Murdoch wrote:
> On 12/04/2024 3:52 p.m., avi.e.gross using gmail.com wrote:
>> Base R has generic functions called any() and all() that I am having 
>> trouble
>> using.
>> It works fine when I play with it in a base R context as in:
>>> all(any(TRUE, TRUE), any(TRUE, FALSE))
>> [1] TRUE
>>> all(any(TRUE, TRUE), any(FALSE, FALSE))
>> [1] FALSE
>> But in a tidyverse/dplyr environment, it returns wrong answers.
>> Consider this example. I have data I have joined together with pairs of
>> columns representing a first generation and several other pairs 
>> representing
>> additional generations. I want to consider any pair where at least one of
>> the pair is not NA as a success. But in order to keep the entire row, 
>> I want
>> all three pairs to have some valid data. This seems like a fairly common
>> reasonable thing often needed when evaluating data.
>> So to make it very general, I chose to do something a bit like this:
> 
> We can't really help you without a reproducible example.  It's not 
> enough to show us something that doesn't run but is a bit like the real 
> code.
> 
> Duncan Murdoch
> 
>> result <- filter(mydata,
>>                   all(
>>                     any(!is.na(first.a), !is.na(first.b)),
>>                     any(!is.na(second.a), !is.na(second.b)),
>>                     any(!is.na(third.a), !is.na(third.b))))
>> I apologize if the formatting is not seen properly. The above logically
>> should work. And it should be extendable to scenarios where you want at
>> least one of M columns to contain data as a group with N such groups 
>> of any
>> size.
>> But since it did not work, I tried a plan that did work and feels 
>> silly. I
>> used mutate() to make new columns such as:
>> result <-
>>    mydata |>
>>    mutate(
>>      usable.1 = (!is.na(first.a) | !is.na(first.b)),
>>      usable.2 = (!is.na(second.a) | !is.na(second.b)),
>>      usable.3 = (!is.na(third.a) | !is.na(third.b)),
>>      usable = (usable.1 & usable.2 & usable.3)
>>    ) |>
>>    filter(usable == TRUE)
>> The above wastes time and effort making new columns so I can check the
>> calculations then uses the combined columns to make a Boolean that can be
>> used to filter the result.
>> I know this is not the place to discuss dplyr. I want to check first 
>> if I am
>> doing anything wrong in how I use any/all. One guess is that the 
>> generic is
>> messed with by dplyr or other packages I libraried.
>> And, of course, some aspects of delayed evaluation can interfere in 
>> subtle
>> ways.
>> I note I have had other problems with these base R functions before and
>> generally solved them by not using them, as shown above. I would much 
>> rather
>> use them, or something similar.
>> Avi
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list