[Rd] Power (^) 10x slower in R since version 1.7.1... What next? [Windows platform]

Philippe Grosjean phgrosjean at sciviews.org
Tue Nov 18 11:06:36 MET 2003


Prof Brian Ripley wrote:
>Your subject line is seriously misleading: this is not `in R' but rather
>in the pre-compiled binary of R on one OS (Windows) against one particular
>runtime (which was actually changed long before R 1.7.1).

OK, I have not tested on other platforms... However, this is also in R as a
consequence, as soon as R is compiled against the slower routines [in
Windows only]

>You could not do this by an `R package': that cannot change the runtime
>code in use.  You (or someone else) could build R against an alternative
>runtime library system, but it might be easier to use a better OS.

I compile my own version of R 1.8.0 against MingW 2.0.1 for this reason...
and I really agree with you: "it might be easier to use a better OS".
However, you should first convince the hundreds of people I target with my R
code. Those are biologists, ecologists, oceanographers,... and most of them
use Windows, not Linux/Unix. So, I am forced to use Windows myself.

>I have yet to see any real statistics application in which this made any
>noticeable difference.  With modern processors one is talking about 10x
>faster than a few milliseconds unless the datasets are going to take many
>seconds even to load into R.  If you have such an application (a real
>example where ^ takes more than say 20% of the total time of the
>statistical analysis, and the total time is minutes) please share it.

Here it is: I am working with very large datasets of zooplankton, containing
among others, results from image analyses on each individual. It is very
common in biology to transform/recode/calculate (or whatever you call it)
raw data according to precalibrated allometric relationships. Those have the
general form of Huxley's equation:

y = a.x^b

Now, you see what I mean: I have to transform about 17 measurements this way
for each individual in my multi-million entries dataset (note that I do not
compute the whole dataset at once), before using methods like LDA, learning
vector quantization (actually, your code from the VR bundle), or random
forest. In this case, especially with lda or lvq, which are pretty fast, it
really makes the difference in term of minutes in my PIV 2.8 Ghz with 1 Gb
memory... and Windows XP.

OK, I can understand that the R-core team does not have time to waste on
this problem, especially because they use a better OS. However, I know a lot
of people (the ones that will use my code to analyze their own zooplankton
series) that would benefit my "own faster-MingW 2.0-compiled R 1.8.0 Windows
version", or a better solution. So what? Do I have to distribute it myself?

Do I have to spot this problem in my benchmark test at
http://www.sciviews.org/other/benchmark.htm (25.7 sec for the whole test
with R 1.8.0 from CRAN against 11.9 sec for R 1.8.0 compiled with MingW
2.0.1 under Windows on the same computer)? I have not updated it since R
version 1.7.0 to avoid publishing such a bad result. And I have not posted
my own compiled R version online, because it is neither a good practice, nor
a good solution...

I am looking for a better solution.
Best,

Philippe Grosjean



More information about the R-devel mailing list