[R] dplyr: summarise across using variable names and a condition

Fri Mar 26 14:47:45 CET 2021

Hello All,

Would like to be able to summarize across in dplyr using variable names and a condition. Below is an example "have" data set followed by an example "need" data set. After that, I've got a vector of numeric variable names. After that, I've got the very humble beginnings of a dplyr-based solution.

What I think I need to be able to do is to submit my variable names to dplyr and then to have a conditional function. If the variable is is in my list of names, calculate the mean and the std. If not, then calculate the mean but label it as a proportion. The question is how to do that. It appears that using variable names might involve !!, or possibly enquo, or possibly quo, but I haven't had much success with these. I imagine I might have been very close but not quite have gotten it. The conditional part seems less difficult but I'm not quite sure how to do that either.

Help with this would be greatly appreciated.

Thanks,

Paul

have <- structure(list(
        ptno = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
                 "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"),
        age1 = c(74, 70, 78, 79, 72, 81, 76, 58, 53, 74, 72, 74, 75,
                 73, 80, 62, 67, 65, 83, 67, 72, 90, 73, 84, 90, 51),
        age2 = c(71, 67, 72, 74, 65, 79, 70, 49, 45, 68, 70, 71, 74,
                 71, 69, 58, 65, 59, 80, 60, 68, 87, 71, 82, 80, 49),
        gender_male = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L,
                        1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L),
        gender_female = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L,
                          0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L),
        race_white = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L,
                       1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
        race_black = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                       0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
        race_other = c(1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
                       0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
        row.names = c(NA, -26L), class = c("tbl_df", "tbl", "data.frame"))

need <-structure(list(
       age1_mean = 72.8076923076923, age1_std = 9.72838827666425,
       age2_mean = 68.2307692307692, age2_std = 10.2227498934785,
       gender_male_prop = 0.576923076923077, gender_female_prop = 0.423076923076923,
       race_white_prop = 0.769230769230769, race_black_prop = 0.0384615384615385,
       race_other_prop = 0.192307692307692),
       row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

vars_num <-  c("age1", "age2")

library(magrittr)
library(dplyr)

have %>%
  summarise(across(
  .cols = !contains("ptno"),
  .fns = list(mean = mean, std = sd),
  .names = "{col}_{fn}"
))