[R] getting summary statistics easily with dplyr

Christopher W Ryan cry@n @end|ng |rom b|ngh@mton@edu
Tue Nov 5 16:39:27 CET 2019


I'm trying to modernize my way of thinking, and my coding, into the
dplyr/tidyverse way of doing things.

To get basic summary statistics on a variable in a dataframe, with the
output also being a dataframe. I previously would do something like this,
using other packages:

library(doBy)
doBy.output <- summaryBy(mpg ~ am, data = mtcars, FUN = fivenum)
str(doBy.output)   ## yes, it's a dataframe
## which I would then incorporate into my report via Sweave and latex
latex(doBy.output, file = "")

## Or this:

library(mosaic)
mosaic.output <- favstats(mpg ~ am, data = mtcars)
str(mosaic.output)  ## yes, it's a dataframe
latex(mosaic.output, file = "")


## What would be the "dplyr way" of doing this?  I know I could specify
each summary statistic individually:

library(dplyr)
dplyr.output <- mtcars %>% group_by(am) %>% summarise(min = min(mpg),
     p25 = quantile(mpg, prob = 0.25),
     p50 = median(mpg),
     p75 = quantile(mpg, prob = 0.75),
     max = max(mpg) )
str(dplyr.output)  ## yes, it's a dataframe
latex(dplyr.output, file = "")

## Is there a way to use a single function like fivenum instead of
specifying each desired summary statistic?  dplyr summarise() wants a
result of length 1, not 5

dplyr.output.2 <- mtcars %>% group_by(am) %>% summarise(fivenum(mpg) )

group_map or group_modify seem like they might do the job, but I could
use some guidance on the syntax.


Thanks.

--Chris Ryan

	[[alternative HTML version deleted]]



More information about the R-help mailing list