[Rd] MKL Acceleration encouraging; need adjust package builds?

David Smith davidsmi at microsoft.com
Mon Nov 23 18:39:25 CET 2015


Hi Paul,

We've been through this process ourselves for the Revolution R Open project. There are a number of pitfalls to avoid, but you can take a look at how we achieved it in the build scripts at:

https://github.com/RevolutionAnalytics/RRO

There are also some very useful notes in the R Installation guide:
https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS 

Most packages do benefit from MKL (or any multi-threaded BLAS) to some degree, although the actual benefit depends on the R functions they call. Some packages (and some built-in R functions) don't call into BLAS endpoints, so you won't see benefits in all cases.

# David Smith

-- 
David M Smith <davidsmi at microsoft.com>
R Community Lead, Revolution Analytics (a Microsoft company)  
Tel: +1 (312) 9205766 (Chicago IL, USA)
Twitter: @revodavid | Blog:  http://blog.revolutionanalytics.com
We are hiring engineers for Revolution R and Azure Machine Learning.

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Paul Johnson
Sent: Monday, November 23, 2015 09:28
To: R Devel List <r-devel at r-project.org>
Subject: [Rd] MKL Acceleration encouraging; need adjust package builds?

Dear R-devel:

The Cluster administrators at KU got enthusiastic about testing
R-3.2.2 with Intel MKL when I asked for some BLAS integration.  Below I forward a performance report, which is encouraging, and thought you would like to know the numbers.  Appears to my untrained eye there are some extraordinary speedups on Cholesky decomposition, determinants, and matrix inversion.

They had difficulty getting R to compile with  R shared BLAS (don't know what went wrong there), so they went the other direction.

In his message to me, the technician says that I should consider adjusting the compilation flags on the packages that use BLAS.  Do you think that is needed? R is compiled with non-shared BLAS libraries, won't packages know where to look for BLAS headers?

2. If I need to do that, I wonder how to do it and which packages need attention.  Eigen and Armadillo packages, and possibly the ones that depend on them, lme4, anything flowing through Rcpp.

Here's the build for some packages. Are they finding MKL BLAS?  How would I know?

* installing *source* package 'RcppArmadillo' ...
** package 'RcppArmadillo' successfully unpacked and MD5 sums checked
* checking LAPACK_LIBS: divide-and-conquer complex SVD available via system LAPACK
** libs
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic    -c RcppArmadillo.cpp -o RcppArmadillo.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic    -c RcppExports.cpp -o RcppExports.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic    -c fastLm.cpp -o fastLm.o
g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib
-L/usr/local/lib64 -o https://na01.safelinks.protection.outlook.com/?url=RcppArmadillo.so&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=AwdY1xC74H25uBIyciugr9HeuGhYhnDGKoQkeDUhpeQ%3d RcppArmadillo.o RcppExports.o fastLm.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64
-Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppArmadillo/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (RcppArmadillo)

* installing *source* package 'RcppEigen' ...
** package 'RcppEigen' successfully unpacked and MD5 sums checked
** libs
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic    -c RcppEigen.cpp -o RcppEigen.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic    -c RcppExports.cpp -o RcppExports.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic    -c fastLm.cpp -o fastLm.o
g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib
-L/usr/local/lib64 -o https://na01.safelinks.protection.outlook.com/?url=RcppEigen.so&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=JKBcv7cUulJ07Du2ksIqghjWlkEkg%2b8RbNL64cvvYus%3d RcppEigen.o RcppExports.o fastLm.o
-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64
-Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppEigen/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (RcppEigen)

* installing *source* package 'MatrixModels' ...
** package 'MatrixModels' successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
Creating a generic function for 'resid' from package 'stats' in package 'MatrixModels'
Creating a generic function for 'fitted.values' from package 'stats'
in package 'MatrixModels'
Creating a generic function for 'coefficients' from package 'stats' in package 'MatrixModels'
Creating a generic function for 'formula' from package 'stats' in package 'MatrixModels'
Creating a generic function for 'coef' from package 'stats' in package 'MatrixModels'
Creating a generic function for 'fitted' from package 'stats' in package 'MatrixModels'
Creating a generic function for 'residuals' from package 'stats' in package 'MatrixModels'
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (MatrixModels)
* installing *source* package 'quantreg' ...
** package 'quantreg' successfully unpacked and MD5 sums checked
** libs
gfortran   -fpic  -g -O2  -c akj.f -o akj.o
gfortran   -fpic  -g -O2  -c boot.f -o boot.o
gfortran   -fpic  -g -O2  -c brute.f -o brute.o
gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include    -fpic
-I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include
-L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64  -c chlfct.c -o chlfct.o
gfortran   -fpic  -g -O2  -c cholesky.f -o cholesky.o
gfortran   -fpic  -g -O2  -c combos.f -o combos.o
gfortran   -fpic  -g -O2  -c crq.f -o crq.o
gfortran   -fpic  -g -O2  -c crqfnb.f -o crqfnb.o
gfortran   -fpic  -g -O2  -c dsel05.f -o dsel05.o
gfortran   -fpic  -g -O2  -c etime.f -o etime.o
gfortran   -fpic  -g -O2  -c extract.f -o extract.o
gfortran   -fpic  -g -O2  -c idmin.f -o idmin.o
gfortran   -fpic  -g -O2  -c iswap.f -o iswap.o
gfortran   -fpic  -g -O2  -c kuantile.f -o kuantile.o
gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include    -fpic
-I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include
-L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64  -c mcmb.c -o mcmb.o
gfortran   -fpic  -g -O2  -c penalty.f -o penalty.o
gfortran   -fpic  -g -O2  -c powell.f -o powell.o
gfortran   -fpic  -g -O2  -c rls.f -o rls.o
gfortran   -fpic  -g -O2  -c rq0.f -o rq0.o
gfortran   -fpic  -g -O2  -c rq1.f -o rq1.o
gfortran   -fpic  -g -O2  -c rqbr.f -o rqbr.o
gfortran   -fpic  -g -O2  -c rqfn.f -o rqfn.o
gfortran   -fpic  -g -O2  -c rqfnb.f -o rqfnb.o
gfortran   -fpic  -g -O2  -c rqfnc.f -o rqfnc.o
gfortran   -fpic  -g -O2  -c rqs.f -o rqs.o
gfortran   -fpic  -g -O2  -c sparskit2.f -o sparskit2.o
gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include    -fpic
-I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include
-L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64  -c srqfn.c -o srqfn.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include    -fpic
-I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include
-L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64  -c srqfnc.c -o srqfnc.o
gfortran   -fpic  -g -O2  -c srtpai.f -o srtpai.o
gcc -std=gnu99 -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib
-L/usr/local/lib64 -o https://na01.safelinks.protection.outlook.com/?url=quantreg.so&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=jwhQtiHxfZFerLI515tW7VRYIEGuxOrLIKktxR4KOlY%3d akj.o boot.o brute.o chlfct.o cholesky.o combos.o crq.o crqfnb.o dsel05.o etime.o extract.o idmin.o iswap.o kuantile.o mcmb.o penalty.o powell.o rls.o rq0.o rq1.o rqbr.o rqfn.o rqfnb.o rqfnc.o rqs.o sparskit2.o srqfn.o srqfnc.o srtpai.o
-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64
-Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/quantreg/libs
** R
** data
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (quantreg)


pj



Hi PJ,

We're still running the benchmarks to quantify the performance increase.

The R benchmarks for the MKL version are promising. The performance increase is varied from test to test, but there isn't any degradation in performance by using the MKL version. You can expect a 2x to 10x performance increase depending on the matrix calculations you are performing. Here are the compilation arguments we used for compiling R with MKL:

--disable-BLAS-shlib
--with-blas="-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -W l,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm" --with-lapack

You may want to include these while recompiling R packages which use BLAS.


Here are the results of the benchmark for the standard R 3.2.2:

R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 2.69466666666667
2400x2400 normal distributed random matrix ^1000____ (sec): 1.42433333333333 Sorting of 7,000,000 random values__________________ (sec): 2.34466666666667
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 33.187 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 14.52
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 4.51008013606039

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 1.203 Eigenvalues of a 640x640 random matrix______________ (sec): 1.60599999999999 Determinant of a 2500x2500 random matrix____________ (sec): 7.64266666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 8.05900000000001 Inverse of a 1600x1600 random matrix________________ (sec): 8.64166666666667
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 4.62477425061321

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.25633333333335 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.894999999999982 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.714 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.4013333333333 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 2.041
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.44505946077978


Total time for all 15 tests_________________________ (sec): 88.6306666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 3.11209972260597
--- End of test ---


Here are the results for the MKL version:

R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 2.88466666666667
2400x2400 normal distributed random matrix ^1000____ (sec): 1.45933333333333 Sorting of 7,000,000 random values__________________ (sec): 2.35166666666667
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 3.37233333333333 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.68666666666666
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 2.25337542617509

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 1.232 Eigenvalues of a 640x640 random matrix______________ (sec): 0.823333333333333 Determinant of a 2500x2500 random matrix____________ (sec): 1.752 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.417 Inverse of a 1600x1600 random matrix________________ (sec): 1.33833333333334
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.32693082905282

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.28600000000001 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 1.00833333333334 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.82266666666666 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.40533333333334 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.91199999999998
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.48790723568791


Total time for all 15 tests_________________________ (sec): 25.7516666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.64469699141649
--- End of test ---



--
Paul E. Johnson
Professor, Political Science        Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fpj.freefaculty.org&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=OQn3ZG5CWA3HRew7kSXouwHTARsGXFvzHHUoicoo%2fBA%3d              https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fcrmda.ku.edu&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=uCFPVsWJzHYMKd6kWq33qFkOXvj4H51zcEEBcOdvxyI%3d

______________________________________________
R-devel at r-project.org mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fstat.ethz.ch%2fmailman%2flistinfo%2fr-devel&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=YFcT64Zhp8Qi1MMSh%2bhiLESj7t4kTfSp8CYoYtRp2LM%3d



More information about the R-devel mailing list