[Rd] Power (^) 10x slower in R since version 1.7.1... What next? [Windows platform]

Tue Nov 18 13:26:34 MET 2003

Why not use exp(y*log(x)) if it is adequate for your purposes?  It is 
faster under Windows.

There really is no value in using millions of cases in LVQ or LDA or, I
suspect, random forests.  But a difference of a few minutes means that
this is well under 20% of the total time unless your statistical analysis
is very much speedier than mine.

On Tue, 18 Nov 2003, Philippe Grosjean wrote:

> Prof Brian Ripley wrote:
> >Your subject line is seriously misleading: this is not `in R' but rather
> >in the pre-compiled binary of R on one OS (Windows) against one particular
> >runtime (which was actually changed long before R 1.7.1).
> 
> OK, I have not tested on other platforms... However, this is also in R as a
> consequence, as soon as R is compiled against the slower routines [in
> Windows only]
> 
> >You could not do this by an `R package': that cannot change the runtime
> >code in use.  You (or someone else) could build R against an alternative
> >runtime library system, but it might be easier to use a better OS.
> 
> I compile my own version of R 1.8.0 against MingW 2.0.1 for this reason...
> and I really agree with you: "it might be easier to use a better OS".
> However, you should first convince the hundreds of people I target with my R
> code. Those are biologists, ecologists, oceanographers,... and most of them
> use Windows, not Linux/Unix. So, I am forced to use Windows myself.
> 
> >I have yet to see any real statistics application in which this made any
> >noticeable difference.  With modern processors one is talking about 10x
> >faster than a few milliseconds unless the datasets are going to take many
> >seconds even to load into R.  If you have such an application (a real
> >example where ^ takes more than say 20% of the total time of the
> >statistical analysis, and the total time is minutes) please share it.
> 
> Here it is: I am working with very large datasets of zooplankton, containing
> among others, results from image analyses on each individual. It is very
> common in biology to transform/recode/calculate (or whatever you call it)
> raw data according to precalibrated allometric relationships. Those have the
> general form of Huxley's equation:
> 
> y = a.x^b
> 
> Now, you see what I mean: I have to transform about 17 measurements this way
> for each individual in my multi-million entries dataset (note that I do not
> compute the whole dataset at once), before using methods like LDA, learning
> vector quantization (actually, your code from the VR bundle), or random
> forest. In this case, especially with lda or lvq, which are pretty fast, it
> really makes the difference in term of minutes in my PIV 2.8 Ghz with 1 Gb
> memory... and Windows XP.
> 
> OK, I can understand that the R-core team does not have time to waste on
> this problem, especially because they use a better OS. However, I know a lot
> of people (the ones that will use my code to analyze their own zooplankton
> series) that would benefit my "own faster-MingW 2.0-compiled R 1.8.0 Windows
> version", or a better solution. So what? Do I have to distribute it myself?
> 
> Do I have to spot this problem in my benchmark test at
> http://www.sciviews.org/other/benchmark.htm (25.7 sec for the whole test
> with R 1.8.0 from CRAN against 11.9 sec for R 1.8.0 compiled with MingW
> 2.0.1 under Windows on the same computer)? I have not updated it since R
> version 1.7.0 to avoid publishing such a bad result. And I have not posted
> my own compiled R version online, because it is neither a good practice, nor
> a good solution...
> 
> I am looking for a better solution.
> Best,
> 
> Philippe Grosjean
> 
> 
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595