[R] very basic series correlation question

David Winsemius dwinsemius at comcast.net
Sun Jan 5 18:59:12 CET 2014


On Jan 5, 2014, at 9:30 AM, Robert Baer wrote:

> 
> On 1/4/2014 7:42 PM, Peter Turner wrote:
>> Hi, I hope the following question is appropriate for the list; reflects
>> that I've yet to use R and have limited statistical sensibility.
>> 
>> I've two metal ion concentration data sets, one each for two nearby
>> watercourses recorded over the same period (2008 to 2012), for which the
>> sampling dates differ across that period.
>> 
>> Would R's cor (stats) function be suitable to obtain a correlation measure
>> for the Y data sets?
> Probably if it is paired on years. Start by plotting your data as scatter plot to check linearity. To read the help:
> ?plot
> Then read ?cor and note the method argument.  By default cor() gives you Pearson correlation coefficients, but depending on the nature of your data, non-parametric Spearman or Kendall coefficients might be more appropriate.

I would have answered differently. The cor() function is not appropriate for doing correlations on serially correlated data, and furthermore it was not clear that the data are sufficiently paired to allow cor() to deliver results, however flawed they might be from a statistical viewpoint. Specifying a non-parametric correlation will not cure the problems of auto-correlation. 

My reading of the original question suggested that the dates of the two series were different, despite being in the same period of years. I would have thought a more complete description of the data was needed before offering any specific advice.

There are methods of doing cross-correlation of distinct time series, but I am not sufficiently skilled to know what degree of sameness for the times of observations would be needed. Peter, I would think you should be searching with terms such as 'time series' and 'cross-correlation'. Generally these issues are sufficiently complex that cookbook methods are flawed (often deeply flawed) and that a real statistician is needed.

-- 

David Winsemius
Alameda, CA, USA




More information about the R-help mailing list