[Rd] R-2.15.2 changes in computation speed. Numerical precision?
Martin Maechler
maechler at stat.math.ethz.ch
Fri Dec 14 10:59:42 CET 2012
>>>>> "PJ" == Paul Johnson <pauljohn32 at gmail.com>
>>>>> on Fri, 14 Dec 2012 01:01:19 -0600 writes:
PJ> On Thu, Dec 13, 2012 at 9:01 PM, Yi (Alice) Wang <yi.wang at unsw.edu.au> wrote:
>> I have also encountered a similar problem. My mvabund package runs much
>> faster on linux/OSX than on windows with both R/2.15.1 and R/2.15.2. For
>> example, with mvabund_3.6.3 and R/2.15.2,
>> system.time(example(anova.manyglm))
>>
PJ> Hi, Alice
PJ> You have a different problem than I do.
PJ> The change from R-2.15.1 to R-2.15.2 makes the program slower on all
PJ> platforms. The slowdown that emerges in R-2.15.2 on all types of
PJ> hardware concerns me.
PJ> It only seemed like a "Windows is better" issue when all the Windows
PJ> users who tested my program were using R-2.15.0 or R-2.15.1. As soon
PJ> as they update R, then they have the slowdown as well.
Paul, I'm pretty sure you are right that it is not just your package.
Rather, the NEWS for R 2.15.2 contain
• The included LAPACK has been updated to 3.4.1, with some patches
from the current SVN sources. (_Inter alia_, this resolves
PR#14692.)
and as I got from your e-mails --- yes, a reproducible example
(without package Amelia) would have been (and would still be)
really enlightening ---
indeed, "the default tolerance" (in a vague sense) of detecting
(near)singularity may well have been tightened in the newer LAPACK.
Martin
>> on OSX returns
>>
>> user system elapsed
>> 3.351 0.006 3.381
>>
>> but on windows 7 it returns
>>
>> user system elapsed
>> 13.13 0.00 13.14
>>
>> I also used svd frequently in my c code though by calling the gsl functions
>> only. In my memory, I think the comp time difference is not that significant
>> with earlier R versions. So maybe it is worth an investigation?
>>
>> Many thanks,
>> Yi Wang
>>
>>
>> On Thu, Dec 13, 2012 at 5:33 PM, Uwe Ligges
>> <ligges at statistik.tu-dortmund.de> wrote:
>>>
>>> Long message, but as far as I can see, this is not about base R but the
>>> contributed package Amelia: Please discuss possible improvements with its
>>> maintainer.
>>>
>>> Best,
>>> Uwe Ligges
>>>
>>>
>>> On 12.12.2012 19:14, Paul Johnson wrote:
>>>>
>>>> Speaking of optimization and speeding up R calculations...
>>>>
>>>> I mentioned last week I want to speed up calculation of generalized
>>>> inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
>>>> souped up generalized inverse algorithm published by
>>>>
>>>> V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
>>>> matrix, Electronic Journal of Linear Algebra,
>>>> 17(2008), 637-650.
>>>>
>>>> I was so delighted to see the computation time drop on my Debian
>>>> system that I boasted to the WIndows users and gave them a test case.
>>>> They answered back "there's no benefits, plus Windows is faster than
>>>> Linux".
>>>>
>>>> That sent me off on a bit of a goose chase, but I think I'm beginning
>>>> to understand the situation. I believe R-2.15.2 introduced a tighter
>>>> requirement for precision, thus triggering longer-lasting calculations
>>>> in many example scripts. Better algorithms can avoid some of that
>>>> slowdown, as you see in this test case.
>>>>
>>>> Here is the test code you can run to see:
>>>>
>>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R
>>>>
>>>> It downloads a data file from that same directory and then runs some
>>>> multiple imputations with the Amelia package.
>>>>
>>>> Here's the output from my computer
>>>>
>>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout
>>>>
>>>> That includes the profile of the calculations that depend on the
>>>> ordinary generalized inverse algorithm based on svd and the new one.
>>>>
>>>> See? The KP algorithm is faster. And just as accurate as
>>>> Amelia:::mpinv or MASS::ginv (for details on that, please review my
>>>> notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).
>>>>
>>>> So I asked WIndows users for more detailed feedback, including
>>>> sessionInfo(), and I noticed that my proposed algorithm is not faster
>>>> on Windows--WITH OLD R!
>>>>
>>>> Here's the script output with R-2.15.0, shows no speedup from the
>>>> KPginv algorithm
>>>>
>>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout
>>>>
>>>> On the same machine, I updated to R-2.15.2, and we see the same
>>>> speedup from the KPginv algorithm
>>>>
>>>>
>>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout
>>>>
>>>> After that, I realized it is an R version change, not an OS
>>>> difference, I was a bit relieved.
>>>>
>>>> What causes the difference in this case? In the Amelia code, they try
>>>> to avoid doing the generalized inverse by using the ordinary solve(),
>>>> and if that fails, then they do the generalized inverse. In R 2.15.0,
>>>> the near singularity of the matrix is ignored, but not in R 2.15.2.
>>>> The ordinary solve is failing almost all the time, thus triggering the
>>>> use of the svd based generalized inverse. Which is slower.
>>>>
>>>> The Katsikis and Pappas 2008 algorithm is the fastest one I've found
>>>> after translating from Matlab to R. It is not so universally
>>>> applicable as svd based methods, it will fail if there are linearly
>>>> dependent columns. However, it does tolerate columns of all zeros,
>>>> which seems to be the problem case in the particular application I am
>>>> testing.
>>>>
>>>> I tried very hard to get the newer algorithm described here to go as
>>>> fast, but it is way way slower, at least in the implementations I
>>>> tried:
>>>> ## KPP
>>>> ## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias. "An
>>>> improved method for
>>>> ## the computation of the Moore Penrose inverse matrix," Applied
>>>> ## Mathematics and Computation, 2011
>>>>
>>>> The notes on that are in the qrginv.R file linked above.
>>>>
>>>> The fact that I can't make that newer KPP algorithm go faster,
>>>> although the authors show it can go faster in Matlab, leads me to a
>>>> bunch of other questions and possibly the need to implement all of
>>>> this in C with LAPACK or EIGEN or something like that, but at this
>>>> point, I've got to return to my normal job. If somebody is good at
>>>> R's .Call interface and can make a pure C implementation of KPP.
>>>>
>>>> I think the key thing is that with R-2.15.2, there is an svd-related
>>>> bottleneck in the multiple imputation algorithms in Amelia. The
>>>> replacement version of the function Amelia:::mpinv does reclaim a 30%
>>>> time saving, while generating imputations that are identical, so far
>>>> as i can tell.
>>>>
>>>> pj
>>>>
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>>
>> --
>>
>>
>> --
>> Dr. Wang, Yi (Alice)
>> Research Assistant Professor
>> Institute of Computational and Theoretical Studies
>> Department of Computer Science
>> Faculty of Science
>> Hong Kong Baptist University
>> Kowloon Tong, Hong Kong
>> Email: yiwang at comp.hkbu.edu.hk
>> Tel: +852-3411-2789
>> Web: http://www.icts.hkbu.edu.hk/yiwang/public/
>>
PJ> --
PJ> Paul E. Johnson
PJ> Professor, Political Science Assoc. Director
PJ> 1541 Lilac Lane, Room 504 Center for Research Methods
PJ> University of Kansas University of Kansas
PJ> http://pj.freefaculty.org http://quant.ku.edu
PJ> ______________________________________________
PJ> R-devel at r-project.org mailing list
PJ> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list