[R] Summarize by two-column factor, retaining original factors

Marc Schwartz (via MN) mschwartz at mn.rr.com
Fri Feb 24 17:29:11 CET 2006


On Fri, 2006-02-24 at 08:18 -0800, Matt Crawford wrote:
> I am having trouble doing the following.  I have a data.frame like
> this, where x and y are a variable that I want to do calculations on:
> 
> Name Year x y
> ab   2001  15 3
> ab   2001  10 2
> ab   2002  12 8
> ab   2003  7 10
> dv   2002  10 15
> dv   2002  3 2
> dv   2003  1 15
> 
> Before I do all the other things I need to do with this data, I need
> to summarize or collapse the data by name and year.  I've found that I
> can do things like
> nameyear<-interaction(name,year)
> dataframe$nameyear<-nameyear
> tapply(dataframe$x,dataframe$nameyear,sum)
> tapply(dataframe$y,dataframe$nameyear,sum)
> and then bind those together.
> 
> But my problem is that I need to somehow retain the original Names in
> my collapsed dataset, so that later I can do analyses with the Name
> factors.  All I can think of is something like
> tapply(dataframe$Name,dataframe$nameyear, somefunction?)
> but nothing seems to work.
> 
> I'm actually trying to convert a SAS program, and I can't get out of
> that mindset.  There, it's a simple Proc Means, By Name Year.
> 
> Thanks for any help or suggestions on the right way to go about this.
> 
> Matt Crawford

Matt,

Just use aggregate():

> aggregate(MyDF[, 3:4], list(Name = MyDF$Name, Year = MyDF$Year), sum)
  Name Year  x  y
1   ab 2001 25  5
2   ab 2002 12  8
3   dv 2002 13 17
4   ab 2003  7 10
5   dv 2003  1 15


See ?aggregate for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list