[R] SVD on very large data matrix
    Stefan Evert 
    stefanML at collocations.de
       
    Tue Apr  9 02:43:23 CEST 2013
    
    
  
On 8 Apr 2013, at 23:21, Andy Cooper <andy_cooper83 at yahoo.co.uk> wrote:
> So, no one has direct experience running irlba on a data matrix as large as 500,000 x 1,000 or larger?
I haven't used irlba in production code, but ran a few benchmarks on much smaller matrices.  My impression was (also from the documentation, I think) was that irlba is designed for use cases where only a few singular values are needed, up to 10 or so.  With 50 singular values, I found randomized SVD to be faster than irlba.
If you're working with a dense 500,000 x 1000 matrix, you'll need a lot of RAM.  Have you tried the svd() function? Most good BLAS libraries include highly optimised SVD code; if your machine has enough CPU cores, even a high-dimensional SVD might be fast enough.
Best,
Stefan
    
    
More information about the R-help
mailing list