[R] Calculate daily means from 5-minute interval data

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue Aug 31 01:53:43 CEST 2021


I do not wish to express any opinion on what should be done or how. But...

1. I assume that when data are missing, they are missing -- i.e.
simply not present in the data. So there will be possibly several/many
in succession missing rows of data corresponding to those times,
right? (Apologies for being a bit dumb about this, but I always need
to check that what I think is blindingly obvious really is).

2. Do note that when one takes daily averages/sd's/whatever summaries
of data that, because of missingness, may be calculated from possibly
quite different numbers of data points -- are whole days sometimes
missing?? -- then all the summaries (e.g. means) are not created
equal: summaries created from more data are more "trustworthy" and
should receive "appropriately" greater weight than those created from
fewer. Makes sense, right?

So I suspect that this may not be as straightforward as you think --
you may wish to find a local statistician with some experience in
these sorts of things to help you deal with them. Up to you, of
course.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 30, 2021 at 4:34 PM Rich Shepard <rshepard using appl-ecosys.com> wrote:
>
> On Tue, 31 Aug 2021, Richard O'Keefe wrote:
>
> > I made up fake data in order to avoid showing untested code. It's not part
> > of the process I was recommending. I expect data recorded every N minutes
> > to use NA when something is missing, not to simply not be recorded. Well
> > and good, all that means is that reshaping the data is not a trivial call
> > to matrix(). It does not mean that any additional package is needed or
> > appropriate and it does not affect the rest of the process.
>
> Richard,
>
> The instruments in the gauge pipe don't know to write NA when they're not
> measuring. :-) The outage period varies greatly by location, constituent
> measured, and other unknown factors.
>
> > You will want the POSIXct class, see ?DateTimeClasses. Do you know whether
> > the time stamps are in universal time or in local time?
>
> The data values are not timestamps. There's one column for date a second
> colume for time and a third column for time zone (P in the case of the west
> coast.
>
> > Above all, it doesn't affect the point that you probably should not
> > be doing any of this.
>
> ? (Doesn't require an explanation.)
>
> Rich
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list