[R] Summing data based on certain conditions

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Fri Apr 2 16:53:07 CEST 2010


Dear Steve,

Multiplying the mean with the number of observations is essentially the same as summing the numbers.

Have a look at the plyr packages.

library(plyr)
ddply(data, c("month", "year"), function(x){
	c(MeanMultiplied = mean(x$ramm) * nrow(x), Sum = sum(x$ramm))
})


----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
  

> -----Oorspronkelijk bericht-----
> Van: r-help-bounces op r-project.org 
> [mailto:r-help-bounces op r-project.org] Namens Steve Murray
> Verzonden: vrijdag 2 april 2010 16:37
> Aan: stephan.kolassa op gmx.de; gunter.berton op gene.com
> CC: r-help op r-project.org
> Onderwerp: Re: [R] Summing data based on certain conditions
> 
> 
> Dear all,
> 
> Thanks for the contributions so far. I've had a look at these 
> and the closest I've come to solving it is the following:
> 
> > data_ave <- ave(data$rammday, by=c(data$month, data$year))
> Warning messages:
> 1: In split.default(x, g) :
>   data length is not a multiple of split variable
> 2: In split.default(seq_along(x), f, drop = drop, ...) :
>   data length is not a multiple of split variable
> 
> 
> I'm slightly confused by the warning message, as the data 
> lengths do appear the same:
> 
> > dim(data)
> [1] 1073    6
> > length(data$year)
> [1] 1073
> > length(data$month)
> [1] 1073
> 
> 
> Maybe the approach I'm taking is wrong. Any suggestions would 
> be gratefully received.
> 
> Many thanks,
> 
> Steve
> 
> 
> ----------------------------------------
> > Date: Wed, 31 Mar 2010 23:31:25 +0200
> > From: Stephan.Kolassa op gmx.de
> > To: smurray444 op hotmail.com
> > CC: r-help op r-project.org
> > Subject: Re: [R] Summing data based on certain conditions
> >
> > ?by may also be helpful.
> >
> > Stephan
> >
> >
> > Steve Murray schrieb:
> >> Dear all,
> >>
> >> I have a dataset of 1073 rows, the first 15 which look as follows:
> >>
> >>> data[1:15,]
> >> date year month day rammday thmmday
> >> 1 3/8/1988 1988 3 8 1.43 0.94
> >> 2 3/15/1988 1988 3 15 2.86 0.66
> >> 3 3/22/1988 1988 3 22 5.06 3.43
> >> 4 3/29/1988 1988 3 29 18.76 10.93
> >> 5 4/5/1988 1988 4 5 4.49 2.70
> >> 6 4/12/1988 1988 4 12 8.57 4.59
> >> 7 4/16/1988 1988 4 16 31.18 22.18
> >> 8 4/19/1988 1988 4 19 19.67 12.33
> >> 9 4/26/1988 1988 4 26 3.14 1.79
> >> 10 5/3/1988 1988 5 3 11.51 6.33
> >> 11 5/10/1988 1988 5 10 5.64 2.89
> >> 12 5/17/1988 1988 5 17 37.46 20.89
> >> 13 5/24/1988 1988 5 24 9.86 9.81
> >> 14 5/31/1988 1988 5 31 13.00 8.63
> >> 15 6/7/1988 1988 6 7 0.43 0.00
> >>
> >>
> >> I am looking for a way by which I can create monthly 
> totals of rammday (rainfall in mm/day; column 5) by doing the 
> following:
> >>
> >> For each case where the month value and the year are the 
> same (e.g. 3 and 1988, in the first four rows), find the mean 
> of the the corresponding rammday values and then times by the 
> number of days in that month (i.e. 31 in this case).
> >>
> >> Note however that the number of month values in each case 
> isn't always the same (e.g. in this subset of data, there are 
> 4 values for month 3, 5 for month 4 and 5 for month 5). Also 
> the months will of course recycle for the following years, so 
> it's not simply a case of finding a monthly total for *all* 
> the 3s in the whole dataset, just those associated with each 
> year in turn.
> >>
> >> How would I go about doing this in R?
> >>
> >> Any help will be gratefully received.
> >>
> >> Many thanks,
> >>
> >> Steve
> >>
> >>
> >>
> >> _________________________________________________________________
> >> We want to hear all your funny, exciting and crazy Hotmail 
> stories. 
> >> Tell us now
> >>
> >> ______________________________________________
> >> R-help op r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>  		 	   		  
> _________________________________________________________________
> 
> Do you have a story that started on Hotmail? Tell us now 
> ______________________________________________
> R-help op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.



More information about the R-help mailing list