[Rd] Power (^) 10x slower in R since version 1.7.1... What next? [Windows platform]

Tue Nov 18 14:55:26 MET 2003

Prof Brian Ripley wrote:

>Why not use exp(y*log(x)) if it is adequate for your purposes?  It is
>faster under Windows.

I will try... Thank you for your advice.

>There really is no value in using millions of cases in LVQ or LDA or, I
>suspect, random forests.  But a difference of a few minutes means that
>this is well under 20% of the total time unless your statistical analysis
>is very much speedier than mine.

No, sorry, the millions of cases are predictions made according to a
training set build around circa 2,000 items. I have a prediction rate with a
method that combines lvq, lda and random forest of about 5,000 items / sec,
which is, roughly, 3 to 4 minutes for 1,000,000 items plus the time to load
the dataset, it is a little bit less than 9 minutes... but more than the
double with that slow ^. Ok, what is 10 minutes in a lifetime ;-)

On Tue, 18 Nov 2003, Philippe Grosjean wrote:

> Prof Brian Ripley wrote:
> >Your subject line is seriously misleading: this is not `in R' but rather
> >in the pre-compiled binary of R on one OS (Windows) against one
particular
> >runtime (which was actually changed long before R 1.7.1).
>
> OK, I have not tested on other platforms... However, this is also in R as
a
> consequence, as soon as R is compiled against the slower routines [in
> Windows only]
>
> >You could not do this by an `R package': that cannot change the runtime
> >code in use.  You (or someone else) could build R against an alternative
> >runtime library system, but it might be easier to use a better OS.
>
> I compile my own version of R 1.8.0 against MingW 2.0.1 for this reason...
> and I really agree with you: "it might be easier to use a better OS".
> However, you should first convince the hundreds of people I target with my
R
> code. Those are biologists, ecologists, oceanographers,... and most of
them
> use Windows, not Linux/Unix. So, I am forced to use Windows myself.
>
> >I have yet to see any real statistics application in which this made any
> >noticeable difference.  With modern processors one is talking about 10x
> >faster than a few milliseconds unless the datasets are going to take many
> >seconds even to load into R.  If you have such an application (a real
> >example where ^ takes more than say 20% of the total time of the
> >statistical analysis, and the total time is minutes) please share it.
>
> Here it is: I am working with very large datasets of zooplankton,
containing
> among others, results from image analyses on each individual. It is very
> common in biology to transform/recode/calculate (or whatever you call it)
> raw data according to precalibrated allometric relationships. Those have
the
> general form of Huxley's equation:
>
> y = a.x^b
>
> Now, you see what I mean: I have to transform about 17 measurements this
way
> for each individual in my multi-million entries dataset (note that I do
not
> compute the whole dataset at once), before using methods like LDA,
learning
> vector quantization (actually, your code from the VR bundle), or random
> forest. In this case, especially with lda or lvq, which are pretty fast,
it
> really makes the difference in term of minutes in my PIV 2.8 Ghz with 1 Gb
> memory... and Windows XP.
>
> OK, I can understand that the R-core team does not have time to waste on
> this problem, especially because they use a better OS. However, I know a
lot
> of people (the ones that will use my code to analyze their own zooplankton
> series) that would benefit my "own faster-MingW 2.0-compiled R 1.8.0
Windows
> version", or a better solution. So what? Do I have to distribute it
myself?
>
> Do I have to spot this problem in my benchmark test at
> http://www.sciviews.org/other/benchmark.htm (25.7 sec for the whole test
> with R 1.8.0 from CRAN against 11.9 sec for R 1.8.0 compiled with MingW
> 2.0.1 under Windows on the same computer)? I have not updated it since R
> version 1.7.0 to avoid publishing such a bad result. And I have not posted
> my own compiled R version online, because it is neither a good practice,
nor
> a good solution...
>
> I am looking for a better solution.
> Best,
>
> Philippe Grosjean
>
>
>
>

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595