[R] [ExternalEmail] Pearson Correlation Speed

Nathan S. Watson-Haigh nathan.watson-haigh at csiro.au
Tue Dec 16 03:23:35 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Charles C. Berry wrote:
> On Mon, 15 Dec 2008, Nathan S. Watson-Haigh wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Nathan S. Watson-Haigh wrote:
>>> I'm trying to calculate Pearson correlation coefficients for a large
>>> matrix of size 18563 x 18563. The following function takes about XX
>>> minutes to complete, and I'd like to do this calculation about 15 times
>>> and so speed is some what of an issue.
> 
> I think you are on the wrong track, Nathan.
> 
> The matrix you are starting with is 18563 x 18563 and the result of 
> finding the correlations amongst the columns of that matrix is also 18563 
> x 18563. It will require more than 5 Gigabytes of memory to store the 
> result and the original matrix.

Yes the memory usage is somewhat large - luckily I have the use of a
cluster with lots of shared memory! However, I'm interested to learn how
you came about the calculation to determine the memory requirements.

> 
> Likely the time needed to do the calc is inflated because of caching 
> issues and if your machine has less than enough memory to store the 
> result and all the intermediate pieces by swapping as well.
> 
> You can finesse these by breaking your problem into smaller pieces, say 
> computing the correlations between each pair of 19 blocks of columns 
> (columns 1:977, 977+1:977, ... 18*977+1:977 ), then assembling the 
> results.

This is possibly, however why is something like this not implemented
internally in the cor() function if it poorly scales due to the large
memory requirements?

> 
> ---
> 
> BTW, R already has the necessary machinery to calculate the crossproduct 
> matrix (etc) needed to find the correlations. You can access the low level 
> linear algebra that R uses. You can marry R to an optimized BLAS if you 
> like.
> 
> So pulling in some other code to do this will not save you anything. If 
> you ever do decide to import C[++] code there is excellent documentation 
> in the Writing R Extensions manual, which you should review before 
> attempting to import C++ code into R.

Thanks, I have seen this and it seemed quite technical to use as a
starting point for someone unfamiliar with both C++ and incorporating
C++ code into R.

Cheers,
Nathan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklHESYACgkQ9gTv6QYzVL68aQCgl0TsZL4CcnWFdlP073d7Vvui
5WAAoIcvGcunYzR+DM0Xv6R1TPmH4oA+
=5As1
-----END PGP SIGNATURE-----



More information about the R-help mailing list