[Rd] R-2.15.2 changes in computation speed. Numerical precision?

Martin Maechler maechler at stat.math.ethz.ch
Fri Dec 14 10:59:42 CET 2012

>>>>> "PJ" == Paul Johnson <pauljohn32 at gmail.com>
>>>>>     on Fri, 14 Dec 2012 01:01:19 -0600 writes:

    PJ> On Thu, Dec 13, 2012 at 9:01 PM, Yi (Alice) Wang <yi.wang at unsw.edu.au> wrote:
    >> I have also encountered a similar problem. My mvabund package runs much
    >> faster on linux/OSX than on windows with both R/2.15.1 and R/2.15.2. For
    >> example, with mvabund_3.6.3 and R/2.15.2,
    >> system.time(example(anova.manyglm))

    PJ> Hi, Alice

    PJ> You have a different problem than I do.

    PJ> The change from R-2.15.1 to R-2.15.2 makes the program slower on all
    PJ> platforms.  The slowdown that emerges in R-2.15.2 on all types of
    PJ> hardware concerns me.

    PJ> It only seemed like a "Windows is better" issue when all the Windows
    PJ> users who tested my program were using R-2.15.0 or R-2.15.1. As soon
    PJ> as they update R, then they have the slowdown as well.

Paul, I'm pretty sure you are right that it is not just your package.
Rather, the NEWS for R 2.15.2  contain 

    • The included LAPACK has been updated to 3.4.1, with some patches
      from the current SVN sources.  (_Inter alia_, this resolves

and as I got from your e-mails --- yes, a reproducible example
(without package Amelia) would have been (and would still be)
 really enlightening ---
indeed,  "the default tolerance" (in a vague sense) of detecting
(near)singularity may well have been tightened in the newer LAPACK.


    >> on OSX returns
    >> user  system elapsed
    >> 3.351   0.006   3.381
    >> but on windows 7 it returns
    >> user  system elapsed
    >> 13.13   0.00   13.14
    >> I also used svd frequently in my c code though by calling the gsl functions
    >> only. In my memory, I think the comp time difference is not that significant
    >> with earlier R versions. So maybe it is worth an investigation?
    >> Many thanks,
    >> Yi Wang
    >> On Thu, Dec 13, 2012 at 5:33 PM, Uwe Ligges
    >> <ligges at statistik.tu-dortmund.de> wrote:
    >>> Long message, but as far as I can see, this is not about base R but the
    >>> contributed package Amelia: Please discuss possible improvements with its
    >>> maintainer.
    >>> Best,
    >>> Uwe Ligges
    >>> On 12.12.2012 19:14, Paul Johnson wrote:
    >>>> Speaking of optimization and speeding up R calculations...
    >>>> I mentioned last week I want to speed up calculation of generalized
    >>>> inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
    >>>> souped up generalized inverse algorithm published by
    >>>> V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
    >>>> matrix, Electronic Journal of Linear Algebra,
    >>>> 17(2008), 637-650.
    >>>> I was so delighted to see the computation time drop on my Debian
    >>>> system that I boasted to the WIndows users and gave them a test case.
    >>>> They answered back "there's no benefits, plus Windows is faster than
    >>>> Linux".
    >>>> That sent me off on a bit of a goose chase, but I think I'm beginning
    >>>> to understand the situation.  I believe R-2.15.2 introduced a tighter
    >>>> requirement for precision, thus triggering longer-lasting calculations
    >>>> in many example scripts. Better algorithms can avoid some of that
    >>>> slowdown, as you see in this test case.
    >>>> Here is the test code you can run to see:
    >>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R
    >>>> It downloads a data file from that same directory and then runs some
    >>>> multiple imputations with the Amelia package.
    >>>> Here's the output from my computer
    >>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout
    >>>> That includes the profile of the calculations that depend on the
    >>>> ordinary generalized inverse algorithm based on svd and the new one.
    >>>> See? The KP algorithm is faster.  And just as accurate as
    >>>> Amelia:::mpinv or MASS::ginv (for details on that, please review my
    >>>> notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).
    >>>> So I asked WIndows users for more detailed feedback, including
    >>>> sessionInfo(), and I noticed that my proposed algorithm is not faster
    >>>> on Windows--WITH OLD R!
    >>>> Here's the script output with R-2.15.0, shows no speedup from the
    >>>> KPginv algorithm
    >>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout
    >>>> On the same machine, I updated to R-2.15.2, and we see the same
    >>>> speedup from the KPginv algorithm
    >>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout
    >>>> After that, I realized it is an R version change, not an OS
    >>>> difference, I was a bit relieved.
    >>>> What causes the difference in this case?  In the Amelia code, they try
    >>>> to avoid doing the generalized inverse by using the ordinary solve(),
    >>>> and if that fails, then they do the generalized inverse. In R 2.15.0,
    >>>> the near singularity of the matrix is ignored, but not in R 2.15.2.
    >>>> The ordinary solve is failing almost all the time, thus triggering the
    >>>> use of the svd based generalized inverse.  Which is slower.
    >>>> The Katsikis and Pappas 2008 algorithm is the fastest one I've found
    >>>> after translating from Matlab to R.  It is not so universally
    >>>> applicable as svd based methods, it will fail if there are linearly
    >>>> dependent columns. However, it does tolerate columns of all zeros,
    >>>> which seems to be the problem case in the particular application I am
    >>>> testing.
    >>>> I tried very hard to get the newer algorithm described here to go as
    >>>> fast, but it is way way slower, at least in the implementations I
    >>>> tried:
    >>>> ##  KPP
    >>>> ## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias.  "An
    >>>> improved method for
    >>>> ## the computation of the Moore Penrose inverse matrix," Applied
    >>>> ## Mathematics and Computation, 2011
    >>>> The notes on that are in the qrginv.R file linked above.
    >>>> The fact that I can't make that newer KPP algorithm go faster,
    >>>> although the authors show it can go faster in Matlab, leads me to a
    >>>> bunch of other questions and possibly the need to implement all of
    >>>> this in C with LAPACK or EIGEN or something like that, but at this
    >>>> point, I've got to return to my normal job.  If somebody is good at
    >>>> R's .Call interface and can make a pure C implementation of KPP.
    >>>> I think the key thing is that with R-2.15.2, there is an svd-related
    >>>> bottleneck in the multiple imputation algorithms in Amelia. The
    >>>> replacement version of the function Amelia:::mpinv does reclaim a 30%
    >>>> time saving, while generating imputations that are identical, so far
    >>>> as i can tell.
    >>>> pj
    >>> ______________________________________________
    >>> R-devel at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> --
    >> --
    >> Dr. Wang, Yi (Alice)
    >> Research Assistant Professor
    >> Institute of Computational and Theoretical Studies
    >> Department of Computer Science
    >> Faculty of Science
    >> Hong Kong Baptist University
    >> Kowloon Tong, Hong Kong
    >> Email: yiwang at comp.hkbu.edu.hk
    >> Tel: +852-3411-2789
    >> Web: http://www.icts.hkbu.edu.hk/yiwang/public/

    PJ> -- 
    PJ> Paul E. Johnson
    PJ> Professor, Political Science      Assoc. Director
    PJ> 1541 Lilac Lane, Room 504      Center for Research Methods
    PJ> University of Kansas                 University of Kansas
    PJ> http://pj.freefaculty.org               http://quant.ku.edu

    PJ> ______________________________________________
    PJ> R-devel at r-project.org mailing list
    PJ> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list