[R] Building R for better performance

Simon Zehnder szehnder at uni-bonn.de
Thu Mar 6 11:53:15 CET 2014


Jonathan,

these results are amazing! But you mentioned before, that MKL is running computations multicore-wise - I think this is the main bottleneck for the gcc. The Intel performance is higher than I expected … and higher than I read from other developers’ experiences. Researchers and institutions relying heavily on simulation-based methods could profit from the intel compiler and the parallel execution of the MKL - of course in large projects based on parallelisation along clusters it remains questionable how the MKL gives a further performance boost - but it is worth to give it a try. 

I will look through the code in a couple of days - thank you very much for the attachment Jonathan! 

Compilation of R with the Intel compiler and especially the linking procedure needs some more support - either via FAQ’s (maybe the Core Dev Team is even happy if Intel is keeping the information up-to-date - they already have a lot to do) or via an own website of Intel especially for R-Users/Developers. It is by the way great, that students have a large sales discount - you get the intel compiler suite for less than a statistics book - this is great but not many people know about it - I got to know about it just by case (the Aixcelerate workshop at the RWTH Aachen with the introduction of the Xeon Phi). Highly specialised applications in need of performance profit especially from the excellent Intel debug tools for parallel computing - these are real workhorses to detect data races etc. (at the HPC-center at the university RWTH Aachen they have not found an alternative able to detect that many data races). 

Regarding my own problem: I haven’t yet managed to make it run - though I have to admit, that the Mac is a delicate soil for compilation experiments and I am not yet that sophisticated as are other R developers here on the list. Of course it would be great to make it run - especially because I would like to test your script on my Mac with clang and gcc-4.8 (btw. which version of gcc was yours?) and then intel. Whatever you consider as the best way to get support and make it run - I follow your suggestion. Afterwards we should at least create a thread here and on the Intel developer zone such that other developers can follow for the Mac architecture. 

At last: It would be interesting to see how the intel compiler performs in case of C++ extensions. Rcpp/RcppArmadillo are heavily growing (Dirk mentioned a couple of days ago, that now around 200 packages already use the Rcpp API to extend R by C++). In this case also vectorisation etc. plays a role and one could see what goes on under the hood. 


Best

Simon



On 05 Mar 2014, at 23:41, Anspach, Jonathan P <jonathan.p.anspach at intel.com> wro

> Simon,
> 
> Thanks for the information and links.  First of all, did you ever resolve your problem?  If not, did you file an issue in Intel Premier Support?  That's the best way to bring it to our attention.  If you don't want to do that I can try to get a compiler or MKL support engineer to look at your Intel Developer Zone discussion.  I have no experience with OS X, so I wouldn't be much help.
> 
> I got the benchmark script, which I've attached, from Texas Advanced Computing Center.  Here are my results (elapsed times, in secs):
> 
>                                                                                                                             gcc build (default)                 icc/MKL build
> Creation, transp., deformation of a 5000x5000 matrix                                3.25                                          2.95
> 5000x5000 normal distributed random matrix ^1000                                   5.13                                          1.52
> Sorting of 14,000,000 random values                                                                  1.61                                          1.64
> 5600x5600 cross-product matrix (b = a' * a)                                                   97.44                                          0.56
> Linear regr. over a 4000x4000 matrix (c = a \ b')                                           46.06                                           0.49
> FFT over 4,800,000 random values                                                                       0.65                                           0.61
> Eigenvalues of a 1200x1200 random matrix                                                      5.55                                           1.37
> Determinant of a 5000x5000 random matrix                                                  34.18                                           0.55
> Cholesky decomposition of a 6000x6000 matrix                                            37.07                                           0.47
> Inverse of a 3200x3200 random matrix                                                             29.49                                           0.57
> 3,500,000 Fibonacci numbers calculation (vector calc)                                  1.31                                            0.38
> Creation of a 6000x6000 Hilbert matrix (matrix calc)                                     0.77                                             0.99
> Grand common divisors of 400,000 pairs (recursion)                                    0.63                                             0.56
> Creation of a 1000x1000 Toeplitz matrix (loops)                                             2.24                                             2.34
> Escoufier's method on a 90x90 matrix (mixed)                                               9.55                                             6.02
> Total                                                                                                                             274.93                                           21.01
> 
> Regards,
> Jonathan Anspach
> Sr. Software Engineer
> Intel Corp.
> jonathan.p.anspach at intel.com
> 713-751-9460
> 
> 
> -----Original Message-----
> From: Simon Zehnder [mailto:szehnder at uni-bonn.de] 
> Sent: Wednesday, March 05, 2014 3:55 AM
> To: Anspach, Jonathan P
> Cc: r-help at r-project.org
> Subject: Re: [R] Building R for better performance
> 
> Jonathan,
> 
> I myself tried something like this - comparing gcc, clang and intel on a Mac. From my experiences in HPC on the university cluster (where we also use the Xeon Phi, Landeshochleistungscluster University RWTH Aachen), the Intel compiler has better code optimization in regard to vectorisation, etc. (clang is up to now suffering from a not yet implemented OpenMP library).
> 
> Here is a revolutionanalytics article about this topic: http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html
> 
> As I usually use the Rcpp package for C++ extensions this could give me further performance. Though, I already failed when trying to compile R with the Intel compiler and linking against the MKL (see my topic in the Intel developer zone: http://software.intel.com/en-us/comment/1767418 and my threads on the R-User list: https://stat.ethz.ch/pipermail/r-sig-mac/2013-November/010472.html). 
> 
> So, to your questions:
> 
> 1) I think that most admins do not even use the Intel compiler to compile R - this seems to me rare. There are some people I know they do and I think they could be aware of it - but these are only a few. As R is growing in usage and I do know from regional user meetings that very large companies start using it in their BI units - this should be of interest.
> 
> 2) I would really welcome this step because compilation with intel (especially on a Mac) and linking to the MKL seems to be delicate. 
> 
> I am interested in the data - so if it is possible send it via the list or directly to my account. Further, could you show some code that you used for the computations? 
> 
> 
> Best
> 
> Simon
> 
> 
> On 04 Mar 2014, at 22:44, Anspach, Jonathan P <jonathan.p.anspach at intel.com> wrote:
> 
>> Greetings,
>> 
>> I'm a software engineer with Intel.  Recently I've been investigating R performance on Intel Xeon and Xeon Phi processors and RH Linux.  I've also compared the performance of R built with the Intel compilers and Intel Math Kernel Library to a "default" build (no config options) that uses the GNU compilers.  To my dismay, I've found that the GNU build always runs on a single CPU core, even during matrix operations.  The Intel build runs matrix operations on multiple cores, so it is much faster on those operations.  Running the benchmark-2.5 on a 24 core Xeon system, the Intel build is 13x faster than the GNU build (21 seconds vs 275 seconds). Unfortunately, this advantage is not documented anywhere that I can see.
>> 
>> Building with the Intel tools is very easy.  Assuming the tools are installed in /opt/intel/composerxe, the process is simply (in bash shell):
>> 
>> $ . /opt/intel/composerxe/bin/compilervars.sh intel64 $ ./configure 
>> --with-blas="-L/opt/intel/composerxe/mkl/lib/intel64 -lmkl_intel_lp64 
>> -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm" --with-lapack 
>> CC=icc CFLAGS=-O2 CXX=icpc CXXFLAGS=-O2 F77=ifort FFLAGS=-O2 FC=ifort 
>> FCFLAGS=-O2 $ make $ make check
>> 
>> My questions are:
>> 1) Do most system admins and/or R installers know about this performance difference, and use the Intel tools to build R?
>> 2) Can we add information on the advantage of building with the Intel tools, and how to do it, to the installation instructions and FAQ?
>> 
>> I can post my data if anyone is interested.
>> 
>> Thanks,
>> Jonathan Anspach
>> Sr. Software Engineer
>> Intel Corp.
>> jonathan.p.anspach at intel.com
>> 713-751-9460
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> <R-benchmark-25_large.R>




More information about the R-help mailing list