[R] Results of applying na.omit on zoo object

Rich Shepard rshepard at appl-ecosys.com
Tue Sep 20 00:18:14 CEST 2011


On Mon, 19 Sep 2011, Marc Schwartz wrote:

> Let me start by acknowledging that I have little practical experience in
> time series analyses, much less proficiency with the zoo package. I just
> don't come across them much in clinical trials/studies, at least the ones
> that I have been involved with over the past 25+ years.

Marc,

   A lot of folks on the mail list here seem to be in the medical side of
biology. I'm a stream ecologist/fluvial geomorphologist with 30 or so years
of professional experience and each project seems to need new software and
increases in my knowledge to address. I now have two projects involving
water quality that will involve integration of time series analyses,
regression analyses, and spatial modeling of terrain and hydrology. The last
component I've used frequently the past decade or so, the advanced data
analyses and statistical modeling has come up only now.

> I do know from prior posts on the matter, that the zoo package seems to
> have some of its own approaches to dealing with dates, as compared to base
> R. So you may need to be clear on the differentiation in code/functions
> required to use some of the package functionality.

   Yes, I can specify the start and end dates using as.Date.

> So from a analytic perspective, I would encourage others to chime in with
> guidance. Missing data generally has an impact at some level, the extent
> of which is going to be specific to the context of the particular analysis
> being performed and any assumptions one may be willing to make.

   Missing data between the first sample and the most current one means, in
my contexts, that access to the site was not possible by high water, deep
snow (some sites at > 7,000 feet amsl), or a dry channel in the late summer.
It's ignoring the NAs prior to the first collected samples that I'm hoping
can easily be specified.

> There is also the r-sig-finance list:
>
>  https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>
> to which this query may be better suited in terms of gaining a focused
> audience in a domain where time series analyses are prevalent.

   I've read some finance/economics-focused time series documents and I
haven't seen the relevance. For example, in the natural environment can we
assume that water samples collected 1 month apart and analyzed for specific
chemical concentrations are autocorrelated? If events such as rain-on-snow
or wildland fires cause a large increase in discharge or clear riparian
vegetation and add soot and chared debris to the stream channel are chemical
concentrations associated with prior ones or to the external influences at
the collection site? Perhaps these data are independent and identically
distributed (iid).

   One of the more interesting (to me, at least) aspects of one of these
projects is to explore the value of the time domain approach for predicting
future values versus the frequency approach to explore periodic and/or
systematic variations in values over time. Regulators tend to focus on the
first and be unaware of the second. At this very early exploratory stage I'm
not sure which approach is more beneficial to my client and the regulators.

> There are also some books on using R for time series analyses, some of
> which are listed on the "Books" link from the R homepage. It would seem
> logical that one or more of them might cover the use of the zoo package,
> but that is a guess on my part.

   I am plowing my way through Sumnway and Stoffer's "Time Series Analysis
and Its Applications with R Examples" and have read Cowpertwait and
Metcalf's "Introductory Time Series with R." I need to look again at Zuur et
al. in both "Analyzing Ecological Data" and "Numerical Ecoogy with R"
specifically for discussions of time series. The books and other documents
I've read (with the exception of an article on sandbars in the Colorado
River) are in situations where data are associated with fixed and regular
periods. In the messy real world not only do weather and other conditions
mean irregular data collection dates, but sometimes the regulators decide
that monthly or quarterly samples are no longer requred so semi-annual
samples are the norm thereafter. Biotic data are even worse. :-)

   Zoo seems to be ideal for the irregular, messy data with which I work.
Since I'm quite new to R it will take me time to get up to full speed with
it and zoo. I greatly appreciate the patience and understanding of all of
you who've helped.

> I hope that the above is helpful Rich.

   Yep. If you know of references to time series analyses of real-world, messy
data, please share them with me.

> I also presume that you got my "final" version of the two functions, with
> the corrected data frame based approach. Sorry for the confusion on that
> earlier.

   Yes, I did, and there was no confusion as I read them all in the same
session.

Again, many thanks,

Rich



More information about the R-help mailing list