[R] x86 SSE* Pointer Favors

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Jun 13 09:30:16 CEST 2008


Let me pick up on

> Enabling SSE instructions in addition while building R (yes, you have to 
> enable them explicitly, see man gcc) is possible but does not help much 
> since all maths is mostly done in BLAS.

The final part is not true for my 'maths', only for those doing linear 
algebra.  Enabling use of SSE registers can help with CPU scheduling, and 
so can have a suprisingly large effect, so if you only run R on a single 
CPU type it is worth tuning the code to that CPU (e.g. -mtune=core2) 
alongside turning up optimization levels.


On Fri, 13 Jun 2008, Ivan Adzhubey wrote:

> Hi Ivo,
>
> On Friday 13 June 2008 12:23:06 am ivo welch wrote:
>> Dear Statisticians--- This is not even an R question, so please
>> forgive me.  I have so much ignorance in this matter that I do not
>> know where to begin.  I hope someone can point me to documentation
>> and/or a sample.
>
> You will sure find some answers to your questions if you look into
> R-admin.html file under "Building from source" section. Do a search on BLAS
> and you will be presented with some options. Using a bit of R web site search
> on the same keyword will give you even more food for thought.
>
>> I want to compute a covariance as quickly as non-humanly possible on
>> an Intel core processor (up to SSE4) under linux.  Alas, I have no
>> idea how to engage CPU vectorization.  Do I need to use special data
>> types, or is "double" correct?  Does SSE* understand NaN?  Should I
>> rely on gcc autodetection of the vectorized meaning of my code, or are
>> there specific libraries that I should call?
>
> I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster
> than the stock R BLAS library, depending on your code. Enabling SSE
> instructions in addition while building R (yes, you have to enable them
> explicitly, see man gcc) is possible but does not help much since all maths
> is mostly done in BLAS.
>
> That said, optimized BLAS libraries give most speed increase with older
> processors. Newer crop of multi-core CPUs with large shared caches is much
> more difficult to hand-tune code for. You may want to subscribe to Goto BLAS
> mailing list for an in-depth discussion. ATLAS community is also very helpful
> (I use their code with our AMD CPUs).
>
>> What I want to learn about is as simple as it gets:
>>   typedef double Double;  // or whatever SSE* needs as close equivalent
>>   Double vector1[N], vector2[N];
>>   // then fill them with stuff.
>
> R does not have types, everything that does not look like character string or
> an integer is treated as double. All arithmetics are always done in double
> precision.
>
>>   vector3= vector_mult(vector1,vector2, N);
>>   vector4= sum(vector1, N);
>>
>> I just need a pointer and/or primer.  PS: If someone knows of a
>> superfast vectorized implementation of Gentleman's WLS algorithm,
>> please point me to it, too.  I am still using my old non-vectorized C
>> routines.
>
> HTH,
> Ivan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list