[R] Correlation for no of variables

Peter Langfelder peter.langfelder at gmail.com
Mon Mar 21 17:50:38 CET 2011


On Mon, Mar 21, 2011 at 8:34 AM, Vincy Pyne <vincy_pyne at yahoo.ca> wrote:
> Dear R helpers,
>
> Suppose I have stock returns data of say 1500 companies each for say last 4 years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns representing companies and 1000 rows of their returns.
>
> I need to find the correlation matrix of these 1500 companies.
>
> So I can find out the correlation as
>
> cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes a tremendous time. Is there any way in expediting such a process. In reality, I may be dealing with lots of even 5000 stocks and may simulate even 100000 stock returns.


How long is "tremendous time"?

What platform are you on? If you can compile R against a tuned BLAS
library, stats::cor will run faster IF you do not have any missing
data.

If you do have missing data, you may want to try the package WGCNA
(where we work with bigger correlation matrices) that implements a
correlation calculation that is faster particularly if there are few
missing data. This will also run faster if you do have a tuned BLAS
installed.

HTH,

Peter

>
>
>
> Kindly guide.
>
> Vincy
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list