[R] Creating dataframe with subtotals by all fields and totals of subtotals

David Winsemius dwinsemius at comcast.net
Thu Sep 15 02:09:30 CEST 2016


> On Sep 14, 2016, at 12:33 PM, Peter Lomas <peter.br.lomas at gmail.com> wrote:
> 
> Hello R-Helpers,
> 
> I'm trying to to create a subtotal category for each column in a
> dataset, and totals of subtotals, resulting in one data frame.  I
> figure I could do this by a whack of aggregate() and rbind(), but I'm
> hoping there is a simpler way.
> 
> Below is a sample dataset. Underneath I create an "All" salesmen
> subtotal and rbind it with the original dataset.  I could do that for
> "Drink" and "Region", then also do combinations of salesmen, drink,
> and region subtotals.  However, I'm hoping somebody out there is more
> clever than I am.

I'm pretty sure that Hadley (who is rather smart) already built that into the plyr package where he lets people specify marginal subtotals. I'd be slightly surprised if that feature wasn't carried over to dplyr, although I have not see it illustrated yet. But my memory may be failing in htis area. I'm not able to put any substance to that notion after searching.

I've answered a couple of questions over the years on StackOverflow that deal with marginal calculations and you might find the addmargin function less of a "whack" that the route you were imagining:

http://stackoverflow.com/questions/5863456/r-calculating-margins-or-row-col-sums-for-a-data-frame
http://stackoverflow.com/questions/5982546/r-calculating-column-sums-row-sums-as-an-aggregation-from-a-dataframe/5982943#5982943


> 
> Thanks!
> Peter
> 
> 
> dat <- structure(list(Date = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
>                                         2L, 3L, 1L, 2L, 3L, 1L, 2L,
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
>                                         3L, 1L, 2L, 3L, 1L, 2L, 3L,
> 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("2012-01",
> 
>                                  "2012-02", "2012-03"), class =
> "factor"), Region = structure(c(1L,
> 
> 
>                          1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L,
> 2L, 2L, 2L, 3L, 3L,
> 
> 
>                          3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L,
> 1L, 1L, 2L, 2L, 2L,
> 
> 
>                          3L, 3L, 3L), .Label = c("Zone1", "Zone2",
> "Zone3"), class = "factor"),
>                      Drink = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 1L,
>                                          1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>                                          2L, 2L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L), .Label = c("Cola",
> 
>                           "Orange Juice"), class = "factor"),
> Salesman = structure(c(1L,
> 
> 
>               1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L,
> 
> 
>               1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L,
> 
> 
>               2L, 2L, 2L, 2L, 2L), .Label = c("Joe", "Marty"), class
> = "factor"),
>                      Sales = c(10L, 36L, 9L, 39L, 12L, 61L, 62L, 28L, 82L, 1L,
>                                38L, 14L, 55L, 50L, 62L, 64L, 69L,
> 65L, 28L, 85L, 66L, 66L,
>                                75L, 59L, 31L, 14L, 93L, 35L, 24L,
> 11L, 4L, 30L, 2L, 17L,
>                                36L, 47L)), .Names = c("Date",
> "Region", "Drink", "Salesman",
>                                                       "Sales"), class
> = "data.frame", row.names = c(NA, -36L))
> 
> 
> 
> all.salesman <- aggregate(Sales~Date+Region+Drink, data=dat, FUN=sum)
> all.salesman$Salesman <- "All"
> dat <- rbind(dat, all.salesman)
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list