[Rd] Getting SSE2 instructions to work in 32-bit builds on Windows

Radford Neal radford at cs.toronto.edu
Fri Aug 21 21:35:55 CEST 2015

When getting pqR to work on Windows, I've wanted for it to be able to
use SSE2 instructions with 32-bit builds, for those 32-bit processors
that have SS2 instructions (all of them from the Pentium 4 onwards).

It seems that R Core 32-bit versions do not attempt this, instead
using the 387 FPU for all floating-point arithmetic.  This is
sometimes slower than using SSE2 instructions, and also produces
results that are not compliant with the IEEE floating point standard,
and that are not reproducible - possibly changing after trivial,
unrelated changes to R or to the C compiler used.

Once can get the gcc used in Rtools to use SSE2 instructions by
including the following compiler options:

  -m32 -msse2 -mfpmath=sse

Unfortunately, the result is that some things then crash.  

The problem is that by default gcc assumes that the stack is aligned
to a 16-byte boundary on entry to a procedure, which allows it to
easily ensure the 16-byte alignment needed for SSE2 instructions.
Unfortunately, Windows does not ensure that a 32-bit application's
stack is 16-byte aligned, so this doesn't work.  (There's no problem
for 64-bit builds, however.)

A solution is to add one more option:

  -m32 -msse2 -mfpmath=sse -mstackrealign

The -mstackrealign option forces gcc to generate code to align the
stack on procedure entry, rather than assuming it is already aligned.

It would probably be enough to compile only a few modules with this
option (ones that are directly called from outside R), and hence avoid
most of the extra procedure call overhead, but I haven't attempted this.

   Radford Neal

More information about the R-devel mailing list