[Rd] Getting SSE2 instructions to work in 32-bit builds on Windows
radford at cs.toronto.edu
Fri Aug 21 21:35:55 CEST 2015
When getting pqR to work on Windows, I've wanted for it to be able to
use SSE2 instructions with 32-bit builds, for those 32-bit processors
that have SS2 instructions (all of them from the Pentium 4 onwards).
It seems that R Core 32-bit versions do not attempt this, instead
using the 387 FPU for all floating-point arithmetic. This is
sometimes slower than using SSE2 instructions, and also produces
results that are not compliant with the IEEE floating point standard,
and that are not reproducible - possibly changing after trivial,
unrelated changes to R or to the C compiler used.
Once can get the gcc used in Rtools to use SSE2 instructions by
including the following compiler options:
-m32 -msse2 -mfpmath=sse
Unfortunately, the result is that some things then crash.
The problem is that by default gcc assumes that the stack is aligned
to a 16-byte boundary on entry to a procedure, which allows it to
easily ensure the 16-byte alignment needed for SSE2 instructions.
Unfortunately, Windows does not ensure that a 32-bit application's
stack is 16-byte aligned, so this doesn't work. (There's no problem
for 64-bit builds, however.)
A solution is to add one more option:
-m32 -msse2 -mfpmath=sse -mstackrealign
The -mstackrealign option forces gcc to generate code to align the
stack on procedure entry, rather than assuming it is already aligned.
It would probably be enough to compile only a few modules with this
option (ones that are directly called from outside R), and hence avoid
most of the extra procedure call overhead, but I haven't attempted this.
More information about the R-devel