[R] Results of applying na.omit on zoo object

Marc Schwartz marc_schwartz at me.com
Tue Sep 20 01:52:08 CEST 2011


Hi Rich,

On Sep 19, 2011, at 5:18 PM, Rich Shepard wrote:

> On Mon, 19 Sep 2011, Marc Schwartz wrote:
> 
>> Let me start by acknowledging that I have little practical experience in
>> time series analyses, much less proficiency with the zoo package. I just
>> don't come across them much in clinical trials/studies, at least the ones
>> that I have been involved with over the past 25+ years.
> 
> Marc,
> 
>  A lot of folks on the mail list here seem to be in the medical side of
> biology. I'm a stream ecologist/fluvial geomorphologist with 30 or so years
> of professional experience and each project seems to need new software and
> increases in my knowledge to address. I now have two projects involving
> water quality that will involve integration of time series analyses,
> regression analyses, and spatial modeling of terrain and hydrology. The last
> component I've used frequently the past decade or so, the advanced data
> analyses and statistical modeling has come up only now.
> 
>> I do know from prior posts on the matter, that the zoo package seems to
>> have some of its own approaches to dealing with dates, as compared to base
>> R. So you may need to be clear on the differentiation in code/functions
>> required to use some of the package functionality.
> 
>  Yes, I can specify the start and end dates using as.Date.

As I noted in my reply to Gabor just now, just be aware of the difference between zoo's as.Date() function, which masks base R's function of the same name, which also has an 'origin' argument.


> 
>> So from a analytic perspective, I would encourage others to chime in with
>> guidance. Missing data generally has an impact at some level, the extent
>> of which is going to be specific to the context of the particular analysis
>> being performed and any assumptions one may be willing to make.
> 
>  Missing data between the first sample and the most current one means, in
> my contexts, that access to the site was not possible by high water, deep
> snow (some sites at > 7,000 feet amsl), or a dry channel in the late summer.
> It's ignoring the NAs prior to the first collected samples that I'm hoping
> can easily be specified.
> 
>> There is also the r-sig-finance list:
>> 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> 
>> to which this query may be better suited in terms of gaining a focused
>> audience in a domain where time series analyses are prevalent.

See comment below…

> 
>  I've read some finance/economics-focused time series documents and I
> haven't seen the relevance. For example, in the natural environment can we
> assume that water samples collected 1 month apart and analyzed for specific
> chemical concentrations are autocorrelated? If events such as rain-on-snow
> or wildland fires cause a large increase in discharge or clear riparian
> vegetation and add soot and chared debris to the stream channel are chemical
> concentrations associated with prior ones or to the external influences at
> the collection site? Perhaps these data are independent and identically
> distributed (iid).
> 
>  One of the more interesting (to me, at least) aspects of one of these
> projects is to explore the value of the time domain approach for predicting
> future values versus the frequency approach to explore periodic and/or
> systematic variations in values over time. Regulators tend to focus on the
> first and be unaware of the second. At this very early exploratory stage I'm
> not sure which approach is more beneficial to my client and the regulators.
> 
>> There are also some books on using R for time series analyses, some of
>> which are listed on the "Books" link from the R homepage. It would seem
>> logical that one or more of them might cover the use of the zoo package,
>> but that is a guess on my part.
> 
>  I am plowing my way through Sumnway and Stoffer's "Time Series Analysis
> and Its Applications with R Examples" and have read Cowpertwait and
> Metcalf's "Introductory Time Series with R." I need to look again at Zuur et
> al. in both "Analyzing Ecological Data" and "Numerical Ecoogy with R"
> specifically for discussions of time series. The books and other documents
> I've read (with the exception of an article on sandbars in the Colorado
> River) are in situations where data are associated with fixed and regular
> periods. In the messy real world not only do weather and other conditions
> mean irregular data collection dates, but sometimes the regulators decide
> that monthly or quarterly samples are no longer requred so semi-annual
> samples are the norm thereafter. Biotic data are even worse. :-)
> 
>  Zoo seems to be ideal for the irregular, messy data with which I work.
> Since I'm quite new to R it will take me time to get up to full speed with
> it and zoo. I greatly appreciate the patience and understanding of all of
> you who've helped.
> 
>> I hope that the above is helpful Rich.
> 
>  Yep. If you know of references to time series analyses of real-world, messy
> data, please share them with me.
> 
>> I also presume that you got my "final" version of the two functions, with
>> the corrected data frame based approach. Sorry for the confusion on that
>> earlier.
> 
>  Yes, I did, and there was no confusion as I read them all in the same
> session.
> 
> Again, many thanks,
> 
> Rich

Having a better idea of your domain, you might want to consider looking at the r-sig-ecology list as a supplement to R-help:

  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

There appears to be reasonable traffic there and you might find others with similar issues and perhaps possible solutions or at least recommendations.

Regards,

Marc



More information about the R-help mailing list