[R] Calculate daily means from 5-minute interval data

Mon Aug 30 04:09:01 CEST 2021

Why would you need a package for this?
> samples.per.day <- 12*24

That's 12 5-minute intervals per hour and 24 hours per day.
Generate some fake data.

> x <- rnorm(samples.per.day * 365)
> length(x)
[1] 105120

Reshape the fake data into a matrix where each row represents one
24-hour period.

> m <- matrix(x, ncol=samples.per.day, byrow=TRUE)

Now we can summarise the rows any way we want.
The basic tool here is ?apply.
?rowMeans is said to be faster than using apply to calculate means,
so we'll use that.  There is no *rowSds so we have to use apply
for the standard deviation.  I use ?head because I don't want to
post tens of thousands of meaningless numbers.

> head(rowMeans(m))
[1] -0.03510177  0.11817337  0.06725203 -0.03578195 -0.02448077 -0.03033692
> head(apply(m, MARGIN=1, FUN=sd))
[1] 1.0017718 0.9922920 1.0100550 0.9956810 1.0077477 0.9833144

Now whether this is a *sensible* way to summarise your flow data is a question
that a hydrologist would be better placed to answer.  I would have started with
> plot(density(x))
which I just did with some real river data (only a month of it, sigh).
Very long tail.
Even
> plot(density(log(r)))
shows a very long tail.  Time to plot the data against time.  Oh my!
All of the long tail came from a single event.
There's a period of low flow, then there's a big rainstorm and the
flow goes WAY up, then over about two days the flow subsides to a new
somewhat higher level.

None of this is reflected in means or standard deviations.
This is *time series* data, and time series data of a fairly special kind.

One thing that might be helpful with your data would simply be
> image(log(m))
For my one month sample, the spike showed up very clearly that way.
Because right now, your first task is to get an idea of what the data
look like, and means-and-standard-deviations won't really do that.

Oh heck, here's another reason to go with image(log(m)).
With image(m) I just see the one big spike.
With image(log(m)), I can see that little spikes often start in the
afternoon of one day and continue into the morning of the next.
>From daily means, it looks like two unusual, but not very
unusual, days.  From the image, it's clearly ONE rainfall event
that just happens to straddle a day boundary.

This is all very basic stuff, which is really the point.  You want to use
elementary tools to look at the data before you reach for fancy ones.

On Mon, 30 Aug 2021 at 03:09, Rich Shepard <rshepard using appl-ecosys.com> wrote:
>
> I have a year's hydraulic data (discharge, stage height, velocity, etc.)
> from a USGS monitoring gauge recording values every 5 minutes. The data
> files contain 90K-93K lines and plotting all these data would produce a
> solid block of color.
>
> What I want are the daily means and standard deviation from these data.
>
> As an occasional R user (depending on project needs) I've no idea what
> packages could be applied to these data frames. There likely are multiple
> paths to extracting these daily values so summary statistics can be
> calculated and plotted. I'd appreciate suggestions on where to start to
> learn how I can do this.
>
> TIA,
>
> Rich
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.