[R] curiosity: next-gen x86 processors and FP32?

ivo welch ivo.welch at anderson.ucla.edu
Sun May 26 07:43:51 CEST 2013

dear R experts:

although my question may be better asked on the HPC R mailing list, it
is really about something that average R users who don't plan to write
clever HPC-optimized code would care about: is there a quantum
performance leap on the horizon with CPUs?

like most R average non-HPC users, I want to stick mostly to
mainstream R, often with library parallel but that's it.  I like R to
be fast and effortless.  I don't want to have to rewrite my code
greatly to take advantage of my CPU.  the CUDA forth-and-back on the
memory which requires code rewrites makes CUDA not too useful for me.
in fact, I don't even like setting up computer clusters.  I run code
only on my single personal machine.

now, I am looking at the two upcoming processors---intel haswell (next
month) and amd kaveri (end of year).  does either of them have the
potential to be a quantum leap for R without complex code rewrites?
I presume that any quantum leaps would have to come from R using a
different numerical vector "engine".   (I tried different compiler
optimizations when compiling R (such as AVX) on the 1-year old i7-27*,
but it did not really make a difference in basic R benchmarks, such as
simple OLS calculations.  I thought AVX would provide a faster vector
engine, but something didn't really compute here.  pun intended.)

I would guess that haswell will be a nice small evolutionary step
forward.  5-20%, perhaps.  but nothing like a factor 2.

[tomshardware details how intel FP32 math is 4 times as fast as double
math on the i7 architecture.  for most of my applications, a 4 times
speedup at a sacrifice in precision would be worth it.  R seems to use
only doubles---even as.single is not even converting to single, much
less inducing calculations to be single-precision.  so I guess this is
a no-go.  correct?? ]

kaveri's hUMA on the other hand could be a quantum leap.  kaveri could
have the GPU transparently offer common standard built-in vector
operations that we use in R, i.e., improve the speed of many programs
without the need for a rewrite, by a factor of 5?  hard to believe,
but it would seem that AMD actually beat Intel for R users.  a big
turnaround, given their recent deemphasis of FP on the CPU.
(interestingly, the amd-built Xbox One and PS4 processors were also
reported to have  hUMA.)

worth waiting for kaveri?   anything I can do to drastically speed up
R on intel i7 by going to FP32?


Ivo Welch (ivo.welch at gmail.com)

More information about the R-help mailing list