[R] by group

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Mon Nov 1 23:44:25 CET 2021


This is a fairly simple request and well covered by introductory reading
material.

A decent example was given and I see Andrew provided a base R reply that
should be sufficient. But I do not think he realized you wanted something
different so his answer is not in the format you wanted:

> tapply(dat$wt, dat$Year, mean)  # mean by Year 
2001     2002     2003 
13.50000 14.83333 13.50000 
> tapply(dat$wt, dat$Sex , mean)  # mean by Sex tapply(dat$wt,
list(dat$Year, dat$Sex), mean)  # mean by Year and Sex
F        M
12.44444 15.44444

I personally often prefer to the tidyverse approach which optionally
includes pipes and allows a data frame to be grouped any way you want and
followed by commands. It is easier to output your result this way by
grouping BOTH by Year and Sex at once and getting multiple lines of output.
Note the code below requires a line once like install.packages("tidyverse)

library(tidyverse)
dat <- read.table(
  text = "Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11  ",
  header = TRUE
)

dat %>%
  group_by(Year, Sex) %>%
  summarize( M = mean(wt, na.rm=TRUE))

The output of the above is the rows below:

> dat %>%
  +   group_by(Year, Sex) %>%
  +   summarize( M = mean(wt, na.rm=TRUE))
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
# A tibble: 6 x 3
# Groups:   Year [3]
Year Sex       M
<int> <chr> <dbl>
  1  2001 F      12  
2  2001 M      15  
3  2002 F      13.3
4  2002 M      16.3
5  2003 F      12  
6  2003 M      15  

Note Male and Female have their own rows. It is not that hard to switch it
to your format by rearranging the intermediate data set with pivot_wider()
in the pipeline asking to make multiple new columns from variable Sex and
populating them from the created variable M. The new complete pipeline is
now:

dat %>%
  group_by(Year, Sex) %>%
  summarize( M = mean(wt, na.rm=TRUE)) %>%
  pivot_wider(names_from = Sex, values_from = M)

The output as a tibble is:

Year     F     M
<int> <dbl> <dbl>
  1  2001  12    15  
2  2002  13.3  16.3
3  2003  12    15  

Or as a data.frame which seems to add zeroes:

dat %>%
  +   group_by(Year, Sex) %>%
  +   summarize( M = mean(wt, na.rm=TRUE)) %>%
  +   pivot_wider(names_from = Sex, values_from = M) %>%
  +   as.data.frame
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
Year        F        M
1 2001 12.00000 15.00000
2 2002 13.33333 16.33333
3 2003 12.00000 15.00000

Your expected output is too rounded as it expects 13.3 and 16.3 but if you
insist on a single significant digit after the decimal point, ask for it to
be rounded:

> dat %>%
  +   group_by(Year, Sex) %>%
  +   summarize( M = mean(wt, na.rm=TRUE)) %>%
  +   pivot_wider(names_from = Sex, values_from = M) %>%
  +   as.data.frame %>%
  +   round(1)
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
Year    F    M
1 2001 12.0 15.0
2 2002 13.3 16.3
3 2003 12.0 15.0

And, yes, any of the above can be done in various ways using plain old R,
and especially in the recent versions that have added a somewhat different
way to do pipelines.





-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Val
Sent: Monday, November 1, 2021 5:08 PM
To: r-help using R-project.org (r-help using r-project.org) <r-help using r-project.org>
Subject: [R] by group

Hi All,

How can I generate mean by group. The sample data looks like as follow,
dat<-read.table(text="Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11  ",header=TRUE)

The desired  output  is,
             M        F
2001    15        12
2002    16.33   13.33
2003    15          12

Thank you,

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list