[R] benchmark dual amd opteron

Wed Apr 21 12:02:53 CEST 2004

>Douglas Bates <bates at stat.wisc.edu> writes:

>> "Liaw, Andy" <andy_liaw at merck.com> writes:
>>
>> > It doesn't run as is in R-1.9.0.  R-1.9.0 doesn't like
`eigen.default()',
>> > and after changing that to `eigen()', it can't find Matrix.class().
>>
>> I've changed the Matrix package since that benchmark was written.
>> I'll rewrite the benchmark and submit it to Phillippe Grosjean.

>I have added calls to invisible(gc()) before the timings that involve
>large objects so the calculation doesn't get penalized for repeated
>garbage collections.  I have changed the linear least squares code to
>use a Cholesky decomposition and several other places to use the new
>Matrix package.  Eventually I will clean up the new Matrix package so
>things work as before but I am still rewriting all the Matrix package
>and all the lme4 package.  (There are about 1500 lines of pretty dense
>C code, ssclme.c, in the Matrix package that provide functions that
>are called in the lme4 package but, for reasons related to loaders
>have to sit in the Matrix package.)

>Anyway, here's a new version of R2.R and the results on this system
>(2.0 GHz Pentium-4, Debian GNU Linux, R-2.0.0 (20040420 snapshot),
>Goto's BLAS)

== The short answer: ==
Thank you Doug. This benchmark was done for R 1.6.2... So, it certainly
requires some reworking. Also, many changes would be welcome (see
hereunder). This is a huge work if the benchmark should run in R, S-PLUS,
Matlab, Octave, Scilab, Ox, O-Matrix, etc... Yet, it is more conceivable to
improve it for R only, or perhaps for R and S-PLUS. I don't have time for
that now, but can plan to do, at least some, of this work for, let's say,
the time next R version (2.0) will be available, and to put this in a R
package distributed on CRAN. Of course, I appreciate help!

== The long answer: ==
The second version of the benchmark is now as old as R 1.6.2! It certainly
needs some reworking. However, there is a major problem in my benchmark
script: it does not check the results! At the time I made it, I did all the
required check manually to make sure that all software did complete and
returned meaningful results. Now, with all these changes, I am not so sure
it is still true. For instance, if a software just fails in the middle of
the calculation, but returns nicely (let's say with NAs), then the timing
will be done exactly as if it succeed... However, it is certainly not the
same!

I got lot of critics about these benchmarks, but obviously, it is still
useful for some applications (is it worth to invest in a dual processor
machine, or so?). Also, it helped to track slower functions in R: sort() up
to version 1.5 and, more recently, slow operation of the exponentiation (^)
in version 1.8.X where tracked thanks to this benchmark and subsequently
solved by the R developers...

It takes lot of time to set up such a benchmark. However, with time, I think
the following would be worth considering:

1) To add some code for checking results returned by the tests,

2) To test on small, medium and large data sets, instead of just one data
set size... but what is 'small', 'medium', 'large' exactly?

3) To test more functions and get a more complete coverage for a better
overall estimate of speed,

4) To benchmark graph,

5) To use real world examples instead,

6) To pack it in a R package distributed on CRAN...

Best,

Philippe Grosjean

.......................................................<°}))><....
 ) ) ) ) )
( ( ( ( (   Prof. Philippe Grosjean
\  ___   )
 \/ECO\ (   Numerical Ecology of Aquatic Systems
 /\___/  )  Mons-Hainaut University, Pentagone
/ ___  /(   8, Av. du Champ de Mars, 7000 Mons, Belgium
 /NUM\/  )
 \___/\ (   phone: + 32.65.37.34.97, fax: + 32.65.37.33.12
       \ )  email: Philippe.Grosjean at umh.ac.be
 ) ) ) ) )  SciViews project coordinator (http://www.sciviews.org)
( ( ( ( (
...................................................................