svd() very slow for wide matrix X but not for t(X) (PR#1761)

gregory_r_warnes@groton.pfizer.com gregory_r_warnes@groton.pfizer.com
Tue, 9 Jul 2002 17:40:08 +0200 (MET DST)


Hi All,

A colleague has just pointed out to me prcomp is *much* slower and memory
intensive when applied to very wide matrixes than when applied to the
transpose of the same matrix.  In looking for the reason, I discovered that
that the singular value decomposition function svd is the primary culprit:

On my Solaris workstation, under R 1.5.1:

	>  nr <- 50
	>  nc <- 1000
	>  x  <- matrix( rnorm( nr*nc), nrow=nr, ncol=nc )
	>  tx <- t(x)
	>
	> # SVD directly on matrix is SLOW
	>  system.time( val.x <- svd(x)$u )
	[1] 5.41 0.00 5.62 0.00 0.00
	>
	> # SVD directly on t(matrix) is FAST
	>  system.time( val.tx <- svd(tx)$v )
	[1] 0.60 0.00 0.61 0.00 0.00
	>
	> # but the results are (essentially) identical
	> max( abs(val.x) - abs(val.tx) )
	[1] 4.201361e-13

The speed difference gets worse as the number of columns grows.  

Here is a simple patch to $R_SRC/src/library/base/R/svd.R that works around
the problem.  It simply checks whether the matrix is wider than it is long,
and if so, calls itself on the transpose of the matrix, and then exchanges
the returned u and v matrixes.


*** svd.R.orig Tue Jul  9 11:06:46 2002
--- svd.R Tue Jul  9 11:06:49 2002
***************
*** 5,10 ****
--- 5,17 ----
      n <- dx[1]
      p <- dx[2]
      if(!n || !p) stop("0 extent dimensions")
+ 
+     if( p > n )  # compute on the transpose if wide
+       {
+         s <- svd( t(x), nv=nu, nu=nv)
+         return( d=s$d, u=s$v, v=s$u )
+       }
+     
      if (is.complex(x)) {
          res <- La.svd(x, nu, nv)
          return(list(d = res$d, u = if(nu) res$u, v = if(nv)
Conj(t(res$vt))))


With this patch the speed is almost the same whether called on x or tx:

	> system.time( val.x <- svd(x)$u )
	[1] 0.62 0.00 0.66 0.00 0.00

	> system.time( val.tx <- svd(tx)$v )
	[1] 0.60 0.00 0.86 0.00 0.00

	> identical( val.x, val.tx )
	[1] TRUE

This is fine for my purposes, but perhaps someone familiar with the FORTRAN
code should see why its performance is so unbalanced.

Extra details:

> version
         _                   
platform sparc-sun-solaris2.8
arch     sparc               
os       solaris2.8          
system   sparc, solaris2.8   
status   Patched             
major    1                   
minor    5.1                 
year     2002                
month    06                  
day      18                  
language R                   
>

-Greg Warnes








LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._