[R] Antwort: Re: dplyr : row total for all groups in dplyr summarise

David Winsemius dwinsemius at comcast.net
Tue Jul 5 18:47:29 CEST 2016


> On Jul 5, 2016, at 2:27 AM, G.Maubach at weinwolf.de wrote:
> 
> Hi guys,
> 
> I checked out your example but I can't follow the results.:
> 
>> mtcars %>%
> +   group_by (am, gear) %>%
> +   summarise (n=n()) %>%
> +   mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
> +   ungroup() %>%
> +   mutate(row.tot = sum(n))
> Source: local data frame [4 x 5]
> 
>     am  gear     n rel.freq row.tot
>  (dbl) (dbl) (int)    (chr)   (int)
> 1     0     3    15      79%      32
> 2     0     4     4      21%      32
> 3     1     4     8      62%      32
> 4     1     5     5      38%      32
> 
> We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %. 
> The same with the other columns. How is 79 % calculated?
> 

It is apparently the number of items in the first “group determinant” 

> mtcars %>%
+    group_by (am, gear) %>%
+    summarise (n=n()) %>%
+    mutate(sum = sum(n)) %>%
+    ungroup()
Source: local data frame [4 x 4]

     am  gear     n   sum
  (dbl) (dbl) (int) (int)
1     0     3    15    19
2     0     4     4    19
3     1     4     8    13
4     1     5     5    13
> ?n
> with(mtcars,table(am,gear))
   gear
am   3  4  5
  0 15  4  0
  1  0  8  5

The documentation for the `n` functions is particularly unhelpful in letting one know what to expect from it:

"Description

This function is implemented special for each data source and can only be used from within summarise, mutate and filter"
— 

David.


> When searching the web I saw this example:
> 
> -- cut --
> 
> #-- not run --
> url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv"
> response <- GET(url)
> Hollywoodmovies2011 <- content(x = GET(url), as = data.frame)
> #-- end not run
> 
> Hollywoodmovies2011 %>% 
>  group_by(genre) %>%
>  summarize(count = n()) %>%
>  mutate(rf = count / sum(count))
> 
> -- cut --
> 
> which gives
> 
> Source: local data frame [9 x 3]
> 
>      Genre count           %
>     (fctr) (int)       (dbl)
> 1    Action    32 0.235294118
> 2 Adventure     1 0.007352941
> 3 Animation    12 0.088235294
> 4    Comedy    27 0.198529412
> 5     Drama    21 0.154411765
> 6   Fantasy     2 0.014705882
> 7    Horror    17 0.125000000
> 8   Romance    11 0.080882353
> 9  Thriller    13 0.095588235
> 
> Here the % correspond to the count and the sum of count, e. g. sum = 136 
> and 32 / 136 = 0,2352941.
> 
> What is the difference when counting? What do the relative counts in the 
> first example mean?
> 
> Kind regards
> 
> Georg
> 
> 
> 
> 
> 
> Von:    Ulrik Stervbo <ulrik.stervbo at gmail.com>
> An:     David Winsemius <dwinsemius at comcast.net>, 
> Kopie:  r-help at r-project.org, maicel at infomed.sld.cu
> Datum:  05.07.2016 06:06
> Betreff:        Re: [R] dplyr : row total for all groups in dplyr 
> summarise
> Gesendet von:   "R-help" <r-help-bounces at r-project.org>
> 
> 
> 
> That will give you the wrong result when used on summarised data
> 
> David Winsemius <dwinsemius at comcast.net> schrieb am Di., 5. Juli 2016 
> 02:10:
> 
>> I thought there was an nrow() function?
>> 
>> Sent from my iPhone
>> 
>> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> 
> wrote:
>> 
>> If you want the total number of rows in the original data.frame after
>> counting the rows in each group, you can ungroup and sum the row counts,
>> like:
>> 
>> library("dplyr")
>> 
>> 
>> mtcars %>%
>>   group_by (am, gear) %>%
>>   summarise (n=n()) %>%
>>   mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>>   ungroup() %>%
>>   mutate(row.tot = sum(n))
>> 
>> HTH
>> Ulrik
>> 
>> On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsemius at comcast.net>
>> wrote:
>> 
>>> 
>>>> On Jul 4, 2016, at 6:56 AM, maicel at infomed.sld.cu wrote:
>>>> 
>>>> Hello,
>>>> How can I aggregate row total for all groups in dplyr summarise ?
>>> 
>>> Row total … of what? Aggregate … how? What is the desired answer?
>>> 
>>> 
>>> 
>>>> library(dplyr)
>>>> mtcars %>%
>>>> group_by (am, gear) %>%
>>>> summarise (n=n()) %>%
>>>> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
>>>> 
>>>> best regard
>>>> Maicel Monzon
>>>> 
>>>> 
>>>> 
>>>> ----------------------------------------------------------------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Este mensaje le ha llegado mediante el servicio de correo electronico
>>> que ofrece Infomed para respaldar el cumplimiento de las misiones del
>>> Sistema Nacional de Salud. La persona que envia este correo asume el
>>> compromiso de usar el servicio a tales fines y cumplir con las 
> regulaciones
>>> establecidas
>>>> 
>>>> Infomed: http://www.sld.cu/
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
>                 [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list