[R] Calculation of group summaries

Francisco J. Zagmutt gerifalte28 at hotmail.com
Tue Jul 12 20:34:39 CEST 2005

Take a look at ?aggregate ?ave and ?tapply



>From: Seeliger.Curt at epamail.epa.gov
>To: R-Help <r-help at stat.math.ethz.ch>
>Subject: [R] Calculation of group summaries
>Date: Tue, 12 Jul 2005 10:51:03 -0700
>I know R has a steep learning curve, but from where I stand the slope
>looks like a sheer cliff.  I'm pawing through the available docs and
>have come across examples which come close to what I want but are
>proving difficult for me to modify for my use.
>Calculating simple group means is fairly straight forward:
>   data(PlantGrowth)
>   attach(PlantGrowth)
>   stack(mean(unstack(PlantGrowth)))
>I'd like to do something slightly more complex, using a data frame and
>groups identified by unique combinations of three id variables.  There
>may be thousands of such combinations in the data.  This is easy in SQL:
>   select year,
>          site_id,
>          visit_no,
>          mean(undercut) AS meanUndercut,
>          count(undercut) AS nUndercut,
>          std(undercut) AS stdUndercut
>   from channelMorphology
>   group by year, site_id, visit_no
>       ;
>Reading a CSV written by SAS and selecting only records expected to have
>values is also straight forward in R, but getting those summary values
>for each site visit is currently beyond me:
>   sub<-read.csv('c:/data/channelMorphology.csv'
>                ,header=TRUE
>                ,na.strings='.'
>                ,sep=','
>                ,strip.white=TRUE
>                )
>   undercut<-subset(sub,
>                   ,TRANSDIR %in% c('LF','RT')
>                            ,'UNDERCUT'
>                            )
>                   ,drop=TRUE
>                   )
>Thanks all for your help.
>Curt Seeliger, Data Ranger
>CSC, EPA/WED contractor
>seeliger.curt at epa.gov
>R-help at stat.math.ethz.ch mailing list
>PLEASE do read the posting guide! 

More information about the R-help mailing list