[Rd] R-2.15.2 changes in computation speed. Numerical precision?

Thu Dec 13 10:33:46 CET 2012

Long message, but as far as I can see, this is not about base R but the 
contributed package Amelia: Please discuss possible improvements with 
its maintainer.

Best,
Uwe Ligges

On 12.12.2012 19:14, Paul Johnson wrote:
> Speaking of optimization and speeding up R calculations...
>
> I mentioned last week I want to speed up calculation of generalized
> inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
> souped up generalized inverse algorithm published by
>
> V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
> matrix, Electronic Journal of Linear Algebra,
> 17(2008), 637-650.
>
> I was so delighted to see the computation time drop on my Debian
> system that I boasted to the WIndows users and gave them a test case.
> They answered back "there's no benefits, plus Windows is faster than
> Linux".
>
> That sent me off on a bit of a goose chase, but I think I'm beginning
> to understand the situation.  I believe R-2.15.2 introduced a tighter
> requirement for precision, thus triggering longer-lasting calculations
> in many example scripts. Better algorithms can avoid some of that
> slowdown, as you see in this test case.
>
> Here is the test code you can run to see:
>
> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R
>
> It downloads a data file from that same directory and then runs some
> multiple imputations with the Amelia package.
>
> Here's the output from my computer
>
> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout
>
> That includes the profile of the calculations that depend on the
> ordinary generalized inverse algorithm based on svd and the new one.
>
> See? The KP algorithm is faster.  And just as accurate as
> Amelia:::mpinv or MASS::ginv (for details on that, please review my
> notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).
>
> So I asked WIndows users for more detailed feedback, including
> sessionInfo(), and I noticed that my proposed algorithm is not faster
> on Windows--WITH OLD R!
>
> Here's the script output with R-2.15.0, shows no speedup from the
> KPginv algorithm
>
> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout
>
> On the same machine, I updated to R-2.15.2, and we see the same
> speedup from the KPginv algorithm
>
> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout
>
> After that, I realized it is an R version change, not an OS
> difference, I was a bit relieved.
>
> What causes the difference in this case?  In the Amelia code, they try
> to avoid doing the generalized inverse by using the ordinary solve(),
> and if that fails, then they do the generalized inverse. In R 2.15.0,
> the near singularity of the matrix is ignored, but not in R 2.15.2.
> The ordinary solve is failing almost all the time, thus triggering the
> use of the svd based generalized inverse.  Which is slower.
>
> The Katsikis and Pappas 2008 algorithm is the fastest one I've found
> after translating from Matlab to R.  It is not so universally
> applicable as svd based methods, it will fail if there are linearly
> dependent columns. However, it does tolerate columns of all zeros,
> which seems to be the problem case in the particular application I am
> testing.
>
> I tried very hard to get the newer algorithm described here to go as
> fast, but it is way way slower, at least in the implementations I
> tried:
> ##  KPP
> ## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias.  "An
> improved method for
> ## the computation of the Moore Penrose inverse matrix," Applied
> ## Mathematics and Computation, 2011
>
> The notes on that are in the qrginv.R file linked above.
>
> The fact that I can't make that newer KPP algorithm go faster,
> although the authors show it can go faster in Matlab, leads me to a
> bunch of other questions and possibly the need to implement all of
> this in C with LAPACK or EIGEN or something like that, but at this
> point, I've got to return to my normal job.  If somebody is good at
> R's .Call interface and can make a pure C implementation of KPP.
>
> I think the key thing is that with R-2.15.2, there is an svd-related
> bottleneck in the multiple imputation algorithms in Amelia. The
> replacement version of the function Amelia:::mpinv does reclaim a 30%
> time saving, while generating imputations that are identical, so far
> as i can tell.
>
> pj
>
>
>