[R] combine filter() and select()

Thu Aug 20 12:46:33 CEST 2020

A kind of hybrid answer is to use base::subset(), which supports non-standard evaluation (it searches for unquoted symbols like 'files' in the code line below in the object that is its first argument; %>% puts 'mytbl' in that first position) and row (filter) and column (select) subsets

> mytbl %>% subset(files %in% "a", files)
# A tibble: 1 x 1
  files
  <chr>
1 a

Or subset(grepl("a", files), files) if that was what you meant.

One important idea that the tidyverse implements is, in my opinion, 'endomorphism' -- you get back the same type of object as you put in -- so I wouldn't use a base R idiom that returned a vector unless that were somehow essential for the next step in the analysis. 

There is value in having separate functions for filter() and select(), and probably there are edge cases where filter(), select(), and subset() behave differently, but for what it's worth subset() can be used to perform these operations individually

> mytbl %>% subset(, files)
# A tibble: 6 x 1
  files
  <chr>
1 a
2 b
3 c
4 d
5 e
6 f
> mytbl %>% subset(grepl("a", files), )
# A tibble: 1 x 2
  files  prop
  <chr> <int>
1 a         1

Martin Morgan

On 8/20/20, 2:48 AM, "R-help on behalf of Ivan Calandra" <r-help-bounces using r-project.org on behalf of calandra using rgzm.de> wrote:

    Hi Jeff,

    The code you show is exactly what I usually do, in base R; but I wanted
    to play with tidyverse to learn it (and also understand when it makes
    sense and when it doesn't).

    And yes, of course, in the example I gave, I end up with a 1-cell
    tibble, which could be better extracted as a length-1 vector. But my
    real goal is not to end up with a single value or even a single column.
    I just thought that simplifying my example was the best approach to ask
    for advice.

    But thank you for letting me know that what I'm doing is pointless!

    Ivan

    --
    Dr. Ivan Calandra
    TraCEr, laboratory for Traceology and Controlled Experiments
    MONREPOS Archaeological Research Centre and
    Museum for Human Behavioural Evolution
    Schloss Monrepos
    56567 Neuwied, Germany
    +49 (0) 2631 9772-243
    https://www.researchgate.net/profile/Ivan_Calandra

    On 19/08/2020 19:27, Jeff Newmiller wrote:
    > The whole point of dplyr primitives is to support data frames... that is, lists of columns. When you pare your data frame down to one column you are almost certainly using the wrong tool for the job.
    >
    > So, sure, your code works... and it even does what you wanted in the dplyr style, but what a pointless exercise.
    >
    > grep( "a", mytbl$file, value=TRUE )
    >
    > On August 19, 2020 7:56:32 AM PDT, Ivan Calandra <calandra using rgzm.de> wrote:
    >> Dear useRs,
    >>
    >> I'm new to the tidyverse world and I need some help on basic things.
    >>
    >> I have the following tibble:
    >> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
    >> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
    >>
    >> I want to subset the rows with "a" in the column "files", and keep only
    >> that column.
    >>
    >> So I did:
    >> myfile <- mytbl %>%
    >>   filter(grepl("a", files)) %>%
    >>   select(files)
    >>
    >> It works, but I believe there must be an easier way to combine filter()
    >> and select(), right?
    >>
    >> Thank you!
    >> Ivan

    ______________________________________________
    R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.