[Rd] configure can't find dgemm in MKL10

Christopher Paciorek paciorek at hsph.harvard.edu
Tue Apr 22 02:23:21 CEST 2008


Just to follow up on my previous email with some results about this that may be helpful as a guide for others... this was on a Linux box, Intel processor, R 2.6.2.

Following Prof. Ripley's suggestion, we went the shared BLAS route and were able to get this working using Goto BLAS.
The combination of using Intel compilers in place of gnu and using Goto BLAS in place of internal R BLAS gave an order of magnitude speedup in basic linear algebra routines.  

Here's an example of some comparative timings:

Intel compilers, Goto BLAS:
  mat=matrix(rnorm(2000*2000),2000,2000)
> system.time(mat%*%mat)
   user  system elapsed 
  2.203   0.061   2.270 
> cov=t(mat)%*%mat
> system.time(chol(cov))
   user  system elapsed 
  0.442   0.046   0.496 

Gnu compilers, internal R BLAS:
  mat=matrix(rnorm(2000*2000),2000,2000)
>  system.time(mat%*%mat)
   user  system elapsed 
 46.695   0.058  46.793 
>  cov=t(mat)%*%mat
> system.time(chol(cov))
   user  system elapsed 
  4.871   0.026   4.902 

Didn't go back and check, but I believe these timings are about equivalent in speed to just using the R2.6.2 Mac OS X image downloaded directly from CRAN to my MacBook, dual-core Intel.

-chris 

 
 
>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 04/18/08 9:45 AM >>> 
Did you see

   See 'Shared BLAS' for an alternative (and in many ways preferable)
   way to use MKL.

?  That's an easier route to get working, and you can swap BLASes almost 
instantly.

But you need to look at the config.log to see what went wrong.

'Xeon' covers a multitude of processors, but my group's experience is that 
for recent Intel CPUs the Goto BLAS beats all others (including MKL and 
ATLAS).  As you are in academia it is available to you, and it too is easy 
to swap in.


On Fri, 18 Apr 2008, Christopher Paciorek wrote:

> Hi,
> I'm trying to follow the R- admin instructions for using MKL10 as the external BLAS compiling R- 2.6.2 under Linux on a RH EL head node of a cluster.  The configure process seems to have problems when it checks for dgemm in the BLAS.  I'm using configure as:
> ./configure CC=icc F77=ifort -- with- lapack="$MKL" -- with- blas="$MKL"  where $MKL is defined as in R- admin section A.3.1.4.
>
> checking for cblas_cdotu_sub in vecLib framework... no
> checking for dgemm_ in    - L/usr1/util/Intel/mkl/10.0.1.014/lib/em64t                                            - Wl,-- start- group                                                    /usr1/util/Intel/mkl/10.0.1.014/lib/em64t/libmkl_gf_lp64.a                             /usr1/util/Intel/mkl/10.0.1.014/lib/em64t/libmkl_gnu_thread.a                          /usr1/util/Intel/mkl/10.0.1.014/lib/em64t/libmkl_core.a                        - Wl,-- end- group                                              - liomp5 - lguide - lpthread - lgomp... no
> checking for dgemm_... no
> checking for ATL_xerbla in - latlas... yes
> checking for dgemm_ in - lf77blas... no
> checking for dgemm_ in - lblas... yes
> checking for dgemm_ in - ldgemm... no
> checking for dgemm_ in - lblas... (cached) yes
> checking for dgemm_ in - lessl... no
> checking for dgemm_ in - lblas... (cached) yes
>
> I've looked in the MKL .a files and do not actually see dgemm or dgemm_ explicitly.  So this seemingly explains the result of configure which is that BLAS_LIBS is not set to point to MKL but defaults to pointing to the usual Rblas (based on BLAS_LIBS in Makeconf (BLAS_LIBS = - L$(R_HOME)/lib$(R_ARCH) - lRblas) and the absence of a mention of BLAS in the 'External libraries' line at the end of the configure process output.
>
> In looking for dgemm in the MKL .a files (ar t libName.a | grep dgemm),
> libmkl_gf_lp64.a lists cblas_dgemm_lp64.o,_dgemm_lp64.o
> libmkl_core.a lists a bunch of things with dgemm in the name but not dgemm itself, e.g., _dgemm_kernel_0_fb.o,_mc_dgemm_bufs_0.o
> libmkl_gnu_thread.a lists dgemm_omp.o
>
> Incidentally _dgemm.o is listed in libmkl_gf_ilp64.a.
>
> We're running Red Hat Enterprise Linux AS release 4 (Nahant Update 5) on 
> an Intel Xeon head node of a cluster.
>
> Incidentally, this has come about because in playing with my new $1300 
> Macbook, I found it was doing basic matrix work (dense matrix 
> multiplication, Cholesky) about 5x as fast as our Linux cluster.  I 
> haven't looked into it much, but given that CPU use is listed as nearing 
> 200% on the dual core Mac, part of this may be due to the Mac taking 
> advantage of both cores. My hope is that with a faster BLAS the 
> difference between the Mac and our cluster for basic linear algebra will 
> lessen or disappear.
>
> Any tips on what may be going wrong in the configure test process or how 
> to get around this would be helpful.
>
> Thanks,
> Chris
>
> ----------------------------------------------------------------------------------------------
> Chris Paciorek / Asst. Professor        Email: paciorek at hsph.harvard.edu
> Department of Biostatistics             Voice: 617- 432- 4912
> Harvard School of Public Health         Fax:   617- 432- 5619
> 655 Huntington Av., Bldg. 2- 407         WWW: www.biostat.harvard.edu/~paciorek
> Boston, MA 02115 USA                    Permanent forward: paciorek at alumni.cmu.edu
>
> ______________________________________________
> R- devel at r- project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r- devel
>

--  
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list