[R] efficient writing of calculation involving each element of 2 data frames.

Uwe Ligges ligges at statistik.tu-dortmund.de
Mon Feb 25 12:03:39 CET 2008



Vikas N Kumar wrote:
> Hi
> 
> I have 2 data.frames each of the same number of rows (approximately 30000 or
> more entries).
> They also have the same number of columns, lets say 2.
> One column has the date, the other column has a double precision number. Let
> the column names be V1, V2.
> 
> Now I want to calculate the correlation of the 2 sets of data, for the last
> 100 days for every day available in the data.frames.
> 
> My code looks like this :
> # Let df1, and df2 be the 2 data frames with the required data
> ## begin code snippet
> 
> my_corr <- c();
> for ( i_end in 100:nrow(df1)) {
>        i_start <- i_end  - 99;
>        my_corr[i_start] <-
> cor(x=df1[i_start:i_end,"V2"],y=df2[i_start:i_end,"V2"])
> }


I'd rather do it this way:

n <- nrow(df1) - 99
my_corr <- numeric(n)
i_end <- seq(n) + 99
dat1 <- df1[,"V2"]
dat2 <- df2[,"V2"]
for (i in seq(n)) {
        sq <- i:(i+99)
        my_corr[i] <- cor(x=dat1[sq], y=dat2[sq])
}


because most of your time has been consumed by the indexing function
  [.data.frame
as profiling shows. Type ?Rprof in order to learn to so profiling yourself.

Uwe Ligges




> ## end of code snippet
> 
> This runs very slowly, and takes more than an hour to run if I have to
> calculate correlation between 10 data sets leaving me with 45 runs of this
> snippet or taking more than 30 minutes to run.
> 
> Is there an efficient  way to write  this piece of code where I can get it
> to run faster ?
> 
> If I do something similar in Excel, it is much faster. But I have to use R,
> since this is a part of a bigger program.
> 
> Any help will be appreciated.
> 
> Thanks and Regards
> Vikas
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list