[Rd] Some timings for 64 bit Opteron (ATLAS, GOTO, std)

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Mar 6 13:10:54 MET 2004


Martin Maechler <maechler at stat.math.ethz.ch> writes:

> ## gives
> ##                  ATLAS  GOTO   std
> ## boot-Ex          73.38 73.71 73.62
> ## nlme-Ex          31.92 34.18 31.91
> ## mgcv-Ex          29.20 31.69 29.35
> ## MASS-Ex          21.54 20.49 20.29
> ## stats-Ex         17.80 17.69 17.91
> ## lattice-Ex       11.38 11.37 11.05
> ## methods-Ex        6.87  6.53  6.58
> ## base-Ex           5.48  5.28  5.26
> ## graphics-Ex       4.71  4.73  4.70
> ## tools-Ex          3.86  3.66  3.82
> ## cluster-Ex        3.78  3.74  3.65
> ## utils-Ex          2.73  2.60  2.60
> ## p-r-random-tests  2.60  2.58  2.55
> ## survival-Ex       2.48  2.49  2.30
> ## ...
> ## .........

OK, I got around to check this on the Opteron240 system and got just
about the same + 50% which is expectable given the relative CPU
speeds:

                  ATLAS   GOTO    std
boot-Ex          107.63 115.68 105.55
nlme-Ex           55.00  55.28  48.73
mgcv-Ex           36.45  43.02  40.14
MASS-Ex           34.02  35.14  30.81
stats-Ex          27.44  28.12  27.76
lattice-Ex        18.16  19.06  19.05
methods-Ex         9.94   9.86  10.53
base-Ex            8.56   8.70   8.56
graphics-Ex        7.66   7.72   7.43
cluster-Ex         5.69   5.81   5.47
tools-Ex           4.76   4.57   4.81
utils-Ex           4.44   4.37   5.77
demos2             3.88   3.82   3.63
demos              3.71   3.73   3.46
survival-Ex        3.66   3.76   3.61
p-r-random-tests   3.47   3.50   3.47
...

(The system was supposedly idle, but KDE was running on the console so
maybe not quite... Also, the odd cron job may have passed by.)

So, basically the threaded and optimized BLAS's are NOPs for these
suites of standard tasks. The real teeth are not shown until you do
get to tasks which need hardcore numerics:

Plain, ATLAS, Goto in that order. Invert random 3000x3000 matrix

pd at linux:~/r-devel> for i in BUILD* ; do (cd $i ; time echo 'set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(solve(m))'|bin/R --vanilla -q) ; done
> set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(solve(m))
[1] 251.90   1.14 253.08   0.00   0.00
>

real    4m20.967s
user    4m19.431s
sys     0m1.537s
> set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(solve(m))
[1]  3.86  1.10 27.24  0.00  0.00
>

real    0m35.633s
user    0m53.442s
sys     0m1.711s
> set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(solve(m))
[1] 30.06  1.15 31.76  0.00  0.00
>

real    0m39.804s
user    0m42.220s
sys     0m1.621s

(Notice how system.time  gets the CPU usage wrong in the threaded
cases, worst so for ATLAS. Presumably, it is only counting one process
and in the ATLAS case, one that is mostly idle.)

So for matrix inversion, ATLAS seems to be a little faster than Goto
(at the expense of a higher CPU utilization, mind you: the Goto
version appears to be running nearly single-threaded). For matrix
multiply, we have Goto as the fastest:


pd at linux:~/r-devel> for i in BUILD* ; do (cd $i ; time echo 'set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(m%*%m)'|bin/R --vanilla -q) ; done
> set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(m%*%m)
[1] 230.20   0.10 230.36   0.00   0.00
>

real    3m58.639s
user    3m57.857s
sys     0m0.455s
> set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(m%*%m)
[1]  0.34  0.01 16.49  0.00  0.00
>

real    0m25.253s
user    0m38.809s
sys     0m0.535s
> set.seed(1234);m<-matrix(rnorm(9e6),3e3);system.time(m%*%m)
[1] 12.94  0.08 13.06  0.00  0.00
>

real    0m21.629s
user    0m32.223s
sys     0m0.464s




-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907



More information about the R-devel mailing list