[R] how to calculate another vector based on the data from a combination of two factors

Marc Schwartz marc_schwartz at comcast.net
Mon Nov 17 04:39:21 CET 2008


on 11/16/2008 08:12 PM jeffc wrote:
> Hi, 
> 
> I have a data set similar to the following
> 
> State	Gender	Quantity
> TX	Male	1
> NY	Female	2
> TX	Male	3
> NY	Female	4
> 
> 
> I need to calculate cumulative sum of the quantity by State and Gender. The
> expected output is 
> State	Gender	Quantity	CumQuantity
> TX	Male	1	1
> TX	Male	3	4
> NY	Female	2	2
> NY	Female	4	6
> 
> I highly appreciate if someone can give me some hints on solving that in R.
> 
> Hao

I would verify this, but something along the lines of the following:

> DF
  State Gender Quantity
1    TX   Male        1
2    NY Female        2
3    TX   Male        3
4    NY Female        4


do.call(rbind,
        lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
               function(x) cbind(x, cumsum(x$Quantity))))

which yields:

            State Gender Quantity cumsum(x$Quantity)
NY.Female.2    NY Female        2                  2
NY.Female.4    NY Female        4                  6
TX.Male.1      TX   Male        1                  1
TX.Male.3      TX   Male        3                  4



To take this step by step:

First, split() DF by the two factors:

> split(DF, list(DF$State, DF$Gender), drop = TRUE)
$NY.Female
  State Gender Quantity
2    NY Female        2
4    NY Female        4

$TX.Male
  State Gender Quantity
1    TX   Male        1
3    TX   Male        3



Pass that to lapply(), in which we do the cumsum() and cbind():

> lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
         function(x) cbind(x, cumsum(x$Quantity)))
$NY.Female
  State Gender Quantity cumsum(x$Quantity)
2    NY Female        2                  2
4    NY Female        4                  6

$TX.Male
  State Gender Quantity cumsum(x$Quantity)
1    TX   Male        1                  1
3    TX   Male        3                  4



Pass that to do.call() to rbind() the results together:

> do.call(rbind,
          lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
                 function(x) cbind(x, cumsum(x$Quantity))))
            State Gender Quantity cumsum(x$Quantity)
NY.Female.2    NY Female        2                  2
NY.Female.4    NY Female        4                  6
TX.Male.1      TX   Male        1                  1
TX.Male.3      TX   Male        3                  4


See ?split, ?do.call, ?rbind and ?cumsum.

If you want the exact row ordering as you had it in your post, you can
alter the factor levels, otherwise they will be sorted by alpha (eg. NY
before TX and Female before Male).

HTH,

Marc Schwartz



More information about the R-help mailing list