[R] calculate within-day correlations

Fri Sep 14 03:57:54 CEST 2012

On Thu, Sep 13, 2012 at 7:35 PM, emorway <emorway at usgs.gov> wrote:
> useRs,
>
> Here is some R-ready data for my question to follow.  Of course this data is
> small snippet from a much larger dataset that is about a decade long.
>
<snip data>
>
> Q_use<-data.frame(date=as.POSIXct(paste(Q[,1],"-",Q[,2],"-",Q[,3],"
> ",floor(Q[,4]/60),":",Q[,4]-(floor(Q[,4]/60)*60),":00",sep=''),"%Y-%m-%d
> %H:%M:%S",tz=""),Q=Q$Q)
> SC_use<-data.frame(date=as.POSIXct(paste(SC[,1],"-",SC[,2],"-",SC[,3],"
> ",floor(SC[,4]/60),":",SC[,4]-(floor(SC[,4]/60)*60),":00",sep=''),"%Y-%m-%d
> %H:%M:%S",tz=""),SC=SC$SC)
>
> Using the data provided, I’m trying to calculate each day’s correlation
> between Q_use$Q and SC_use$SC and store the values in a data.frame.  An
> example result I’d  like to make is
>
> #Day 1
> cor(Q_use$Q[1:95],SC_use$SC[1:95])
> #[1] -0.4916499
>
> #Day 2
> cor(Q_use$Q[96:191],SC_use$SC[96:191])
> #[1] -0.6085098
>
> edm<-data.frame(Correl=t(t(c(cor(Q_use$Q[1:95],SC_use$SC[1:95]),
> cor(Q_use$Q[96:191],SC_use$SC[96:191])))))
>
> But of course I want R to figure out appropriate indexes (i.e. 1:95, 96:191,
> and so in the larger dataset) for me.  In other words, I'm seeking some help
> with R code that will ‘pass’ through the two datasets calculating each day’s
> correlation and doesn’t rely on the user supplying the ranges of indexes for
> way the daily values reside.
>
> There are, as there always is, a couple of wrinkles.  On day 3, for example,
>
> cor(Q_use$Q[192:287],SC_use$SC[192:287])
> [1] NA
>
> This is because SC_use$SC[275] = NA.  Is there a way to direct R to continue
> calculating that day's correlation using the data that is available for that
> day?  It is also necessary to check and make sure that
> Q_use[i,1]==SC_use[i,1] for each i in that day because in the larger dataset
> the row indices don’t necessarily match up (I have made sure that they do
> for this simple example).  It would be handy to know how many values were
> missing on incomplete days, perhaps in a column appended to the resulting
> data frame.  I appreciate any R code that could help get me started toward
> this end, I’m stuck.  I tried looking at ?aggregate, had a look in the
> reshape library, and ‘rollapply’ in the zoo library, but I wasn’t seeing a
> way to do the error checking I just described.
> Thanks, Eric
>
>
Thanks for the reproducible example.  This is pretty simple with xts:
library(xts)
xQ <- xts(Q_use["Q"], Q_use$date)
xSC <- xts(SC_use["SC"], SC_use$date)
x <- merge(xQ,xSC)

Now all the dates for both data sets are aligned in 'x', so you can
use apply.daily() to run a function over each day:
apply.daily(x, function(y) cor(y[,1],y[,2],use="pairwise.complete.obs"))
                          [,1]
2002-03-28 23:45:00 -0.4916499
2002-03-29 23:45:00 -0.6085098
2002-03-30 23:45:00 -0.1489898
2002-03-31 00:00:00         NA

Note that I had to create a small anonymous wrapper function so I
could pass two objects to the cor() function.

Hope that helps.

>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/calculate-within-day-correlations-tp4643091.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com