[Rd] Problem using F77_CALL(dgemm) in a package

Tue Feb 22 23:44:12 CET 2011

You're right about the code preceding .C.  I stripped down the .C and
.Call codes to be as similar as possible, and the timings were much
closer.

R code:
Call_matrix_multiply = function(A,B){
	C <- matrix(0,nrow(A),ncol(B))
	.Call("R_CALL_matrix_multiply", A, B, C)
	return(C)
}

C_matrix_multiply = function(A,B){
	C <- matrix(0,nrow(A),ncol(B))
	.C("R_matrix_multiply",A,B,nrow(A),ncol(A),ncol(B),C,DUP=FALSE,NAOK=TRUE)
	return(C)
}

C code:

void R_CALL_matrix_multiply(SEXP A, SEXP B, SEXP C) {
	double one = 1.0;
	double zero = 0.0;
	int *dimA = INTEGER(getAttrib(A, R_DimSymbol));
	int *dimB = INTEGER(getAttrib(B, R_DimSymbol));
	F77_CALL(dgemm)("N","N",dimA,dimB+1,dimA+1,&one,REAL(A),dimA,REAL(B),dimA+1,&zero,REAL(C),dimA);
}

void R_matrix_multiply(double * A, double * B, int * m, int *n, int *
					   p, double * C){

	double one = 1.0;
	double zero = 0.0;

	//Here is the actual multiplication
	F77_CALL(dgemm)("N","N",m,p,n,&one,A,m,B,n,&zero,C,m);
}

Timings:

> m = 2000
> n = 2000
> p = 2000
> A = matrix(rnorm(m*n),m,n)
> B = matrix(rnorm(n*p),n,p)
> library(CMATRIX)
> system.time(C_matrix_multiply(A,B))
   user  system elapsed
  2.782   0.035   1.611
> system.time(Call_matrix_multiply(A,B))
   user  system elapsed
  2.789   0.032   1.629
>
> m = 2000
> n = 2000
> p = 2000
> A = matrix(rnorm(m*n),m,n)
> B = matrix(rnorm(n*p),n,p)
> library(CMATRIX)
> system.time(C_matrix_multiply(A,B))
   user  system elapsed
  2.793   0.029   1.609
> system.time(Call_matrix_multiply(A,B))
   user  system elapsed
  2.787   0.029   1.586

Even so, it seems the .Call interface has a lot of advantages.  I
think I will use it for this project and see how I like it.

Jason

On Tue, Feb 22, 2011 at 7:27 AM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
> On Feb 22, 2011, at 1:55 AM, Jason Rudy wrote:
>
>> I just tried that myself and the .Call version is substantially
>> faster.  It seems like there is a lot more going on in the .Call C
>> code than the .C.  Why is .Call faster?  Does it have to do with the
>> way arguments are passed into the C function?  I tried with DUP=FALSE
>> and NAOK=TRUE, but .Call was still the winner.
>>
>
> .Call is the "native" interface so it passes all objects directly as references - essentially the same way that R uses internally (.C with DUP=FALSE and NAOK=TRUE comes close to that, though; the fastest is .External in this respect). Also you're allocating objects as you need them so there will be less copying afterwards. However, I suspect that major part of the difference is also in the code preceding the C code involved in this -- for example as.double() will implicitly create a copy since it needs to strip attributes whereas coerceVector is a no-op if the type matches.
>
> .C is there for historical compatibility - it is essentially equivalent to .Fortran which was the original (and only) interface way back when. .Call is in comparison a more recent addition, that's why you still find a lot of .C code. Personally I don't use .C at all because compared to .Call it is so cumbersome and error-prone (you can't even tell the length of the passed vectors in C!), but others have different preferences.
>
> Cheers,
> Simon
>
>
>> On Sun, Feb 20, 2011 at 4:27 PM, Simon Urbanek
>> <simon.urbanek at r-project.org> wrote:
>>> Jason,
>>>
>>> FWIW the direct interface (.Call) is more efficient and makes passing things from R simpler:
>>>
>>> C_matrix_multiply = function(A,B) .Call("R_matrix_multiply", A, B)
>>>
>>> The drawback is a bit more legwork on the C side, but it also gives you more flexibility:
>>>
>>> SEXP R_matrix_multiply(SEXP A, SEXP B) {
>>>        double one = 1.0;
>>>        double zero = 0.0;
>>>        int *dimA = INTEGER(getAttrib(A, R_DimSymbol));
>>>        int *dimB = INTEGER(getAttrib(B, R_DimSymbol));
>>>        SEXP sDimC = PROTECT(allocVector(INTSXP, 2));
>>>        int *dimC = INTEGER(sDimC);
>>>        SEXP C = PROTECT(allocVector(REALSXP, dimA[0] * dimB[1]));
>>>        if (dimA[1] != dimB[0]) error("incompatible matrices!");
>>>        dimC[0] = dimA[0];
>>>        dimC[1] = dimB[1];
>>>        setAttrib(C, R_DimSymbol, sDimC);
>>>        A = PROTECT(coerceVector(A, REALSXP));
>>>        B = PROTECT(coerceVector(B, REALSXP));
>>>        F77_CALL(dgemm)("N","N",dimA,dimB+1,dimA+1,&one,REAL(A),dimA,REAL(B),dimA+1,&zero,REAL(C),dimA);
>>>        UNPROTECT(4);
>>>        return C;
>>> }
>>>
>>> For comparison:
>>>> A=matrix(rnorm(1e5),500)
>>>> B=matrix(rnorm(1e5),,500)
>>>
>>> .Call:
>>>
>>>> system.time(for (i in 1:10) C_matrix_multiply(A,B))
>>>   user  system elapsed
>>>  0.656   0.008   0.686
>>>
>>> .C:
>>>
>>>> system.time(for (i in 1:10) CC_matrix_multiply(A,B))
>>>   user  system elapsed
>>>  0.886   0.044   0.943
>>>
>>>
>>> in fact .Call is even a tiny bit faster than %*%:
>>>
>>>> system.time(for (i in 1:10) A %*% B)
>>>   user  system elapsed
>>>  0.658   0.004   0.665
>>>
>>> (it's not just a measurement error - it's consistent for more replications etc. - but it's really negligible - possibly just due to dispatch of %*%)
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>> On Feb 20, 2011, at 5:23 PM, Jason Rudy wrote:
>>>
>>>> It was indeed a simple problem!  I took a look at that array.c as you
>>>> suggested and that cleared it right up.  So, the correct C code is:
>>>>
>>>> #include <R.h>
>>>> #include <R_ext/Utils.h>
>>>> #include <R_ext/Lapack.h>
>>>> #include <R_ext/BLAS.h>
>>>>
>>>> void R_matrix_multiply(double * A, double * B, int * m, int *n, int *
>>>> p, double * C){
>>>>
>>>>       double one = 1.0;
>>>>       double zero = 0.0;
>>>>
>>>>        //Just printing the input arguments
>>>>       Rprintf("m = %d, n = %d, p = %d\n",*m,*n,*p);
>>>>       int i;
>>>>       for(i=0;i<(*m**n);i++){
>>>>               Rprintf("%f ",A[i]);
>>>>       }
>>>>       Rprintf("\n");
>>>>       for(i=0;i<(*n**p);i++){
>>>>               Rprintf("%f ",B[i]);
>>>>       }
>>>>       Rprintf("\n");
>>>>       for(i=0;i<(*m**p);i++){
>>>>               Rprintf("%f ",C[i]);
>>>>       }
>>>>       Rprintf("\n");
>>>>
>>>>        //Here is the actual multiplication
>>>>       F77_CALL(dgemm)("N","N",m,p,n,&one,A,m,B,n,&zero,C,m);
>>>> }
>>>>
>>>> The only difference being that I had the 4th and 5th arguments (n and
>>>> p) mixed up.  There was also a problem in my R code after the
>>>> multiplication took place.  For the record, the correct R code is:
>>>>
>>>> C_matrix_multiply = function(A,B){
>>>>       C <- matrix(0,nrow(A),ncol(B))
>>>>       cout <- .C("R_matrix_multiply",as.double(A),as.double(B),nrow(A),ncol(A),ncol(B),as.double(C))
>>>>       return(matrix(cout[[6]],nrow(A),ncol(B)))
>>>> }
>>>>
>>>> Thanks for the help.  Now that I have a functioning example I am well
>>>> on my way to completing this project.
>>>>
>>>> -Jason
>>>>
>>>> On Sun, Feb 20, 2011 at 7:42 AM, Prof Brian Ripley
>>>> <ripley at stats.ox.ac.uk> wrote:
>>>>> Look a close look at matprod in src/main/array in the R sources.
>>>>> Hint: it is the other dimensions you have wrong.
>>>>>
>>>>> And as BLAS is Fortran, counts do start at 1.
>>>>>
>>>>> On Sat, 19 Feb 2011, Jason Rudy wrote:
>>>>>
>>>>>> Dear R-devel,
>>>>>>
>>>>>> I've written a numerical solver for SOCPs (second order cone programs)
>>>>>> in R, and now I want to move most of the solver code into C for speed.
>>>>>> I've written combined R/C packages before, but in this case I need to
>>>>>> do matrix operations in my C code.  As I have never done that before,
>>>>>> I'm trying to write some simple examples to make sure I understand the
>>>>>> basics.  I am stuck on the first one.  I'm trying to write a function
>>>>>> to multiply two matrices using the blas routine dgemm.  The name of my
>>>>>> example package is CMATRIX.  My code is as follows.
>>>>>>
>>>>>> I have a file matrix.c in my src directory:
>>>>>>
>>>>>> #include <R.h>
>>>>>> #include <R_ext/Utils.h>
>>>>>> #include <R_ext/Lapack.h>
>>>>>> #include <R_ext/BLAS.h>
>>>>>>
>>>>>> //Computes C = A*B
>>>>>> void R_matrix_multiply(double * A, double * B, int * m, int *n, int *
>>>>>> p, double * C){
>>>>>>        double one = 1.0;
>>>>>>        double zero = 0.0;
>>>>>>
>>>>>>       //Just printing the input arguments
>>>>>>        Rprintf("m = %d, n = %d, p = %d\n",*m,*n,*p);
>>>>>>        int i;
>>>>>>        for(i=0;i<(*m**n);i++){
>>>>>>                Rprintf("%f ",A[i]);
>>>>>>        }
>>>>>>        Rprintf("\n");
>>>>>>        for(i=0;i<(*n**p);i++){
>>>>>>                Rprintf("%f ",B[i]);
>>>>>>        }
>>>>>>        Rprintf("\n");
>>>>>>        for(i=0;i<(*m**p);i++){
>>>>>>                Rprintf("%f ",C[i]);
>>>>>>        }
>>>>>>        Rprintf("\n");
>>>>>>
>>>>>>
>>>>>>       //Here is the actual multiplication
>>>>>>        F77_CALL(dgemm)("N","N",m,n,p,&one,A,m,B,n,&zero,C,m);
>>>>>> }
>>>>>>
>>>>>> And the file C_matrix_multiply.R in my R directory:
>>>>>>
>>>>>> C_matrix_multiply = function(A,B){
>>>>>>        C <- matrix(0,nrow(A),ncol(B))
>>>>>>        cout <-
>>>>>> .C("R_matrix_multiply",as.double(A),as.double(B),nrow(A),ncol(A),ncol(B),as.double(C))
>>>>>>        return(matrix(cout$C,nrowA,ncol(B)))
>>>>>>
>>>>>> }
>>>>>>
>>>>>> My namespace file is:
>>>>>>
>>>>>> export("C_matrix_multiply")
>>>>>> useDynLib(CMATRIX.so,R_matrix_multiply)
>>>>>>
>>>>>> I'm not sure if it's necessary, but I've also included a Makevars.in
>>>>>> file in my src directory:
>>>>>>
>>>>>> PKG_CPPFLAGS=@PKG_CPPFLAGS@
>>>>>> PKG_CFLAGS=@PKG_CFLAGS@
>>>>>> PKG_LIBS=@PKG_LIBS@  ${LAPACK_LIBS} ${BLAS_LIBS} ${FLIBS}
>>>>>>
>>>>>> which I simply copied from the diversitree package, which seems to use
>>>>>> a lot of fortran.  I have the same problem (which I am getting to)
>>>>>> with or without this Makevars.in file.
>>>>>>
>>>>>> I install my package using:
>>>>>>
>>>>>> R CMD INSTALL CMATRIX
>>>>>>
>>>>>> Then I start up R and attempt to run the following code:
>>>>>>
>>>>>> #Make some random matrices
>>>>>> A = matrix(rnorm(8),4,2)
>>>>>> B = matrix(rnorm(6),2,3)
>>>>>>
>>>>>> #Load my package
>>>>>> library(CMATRIX)
>>>>>>
>>>>>> #Print the matrices
>>>>>> A
>>>>>> B
>>>>>>
>>>>>> #Try to multiply them
>>>>>> product = C_matrix_multiply(A,B)
>>>>>>
>>>>>> What I want, and what according to my understanding should happen, is
>>>>>> for product to contain the same matrix as would result from A %*% B.
>>>>>> Instead, I get the following:
>>>>>>
>>>>>>> A = matrix(rnorm(8),4,2)
>>>>>>> B = matrix(rnorm(6),2,3)
>>>>>>> library(CMATRIX)
>>>>>>> A
>>>>>>
>>>>>>          [,1]         [,2]
>>>>>> [1,] -0.4981664 -0.7243532
>>>>>> [2,]  0.1428766 -1.5501623
>>>>>> [3,] -2.0624701  1.5104507
>>>>>> [4,] -0.5871962  0.3049442
>>>>>>>
>>>>>>> B
>>>>>>
>>>>>>           [,1]            [,2]            [,3]
>>>>>> [1,]  0.02477964 0.5827084 1.8434375
>>>>>> [2,] -0.20200104 1.7294264 0.9071397
>>>>>>>
>>>>>>> C_matrix_multiply(A,B)
>>>>>>
>>>>>> m = 4, n = 2, p = 3
>>>>>> -0.498166 0.142877 -2.062470 -0.587196 -0.724353 -1.550162 1.510451
>>>>>> 0.304944
>>>>>> 0.024780 -0.202001 0.582708 1.729426 1.843437 0.907140
>>>>>> 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
>>>>>> 0.000000 0.000000 0.000000 0.000000 0.000000
>>>>>> Parameter 10 to routine DGEMM  was incorrect
>>>>>> Mac OS BLAS parameter error in DGEMM , parameter #0, (unavailable), is 0
>>>>>>
>>>>>> and R immediately dies.  I know the arguments are being passed into
>>>>>> the C code and everything up to my F77_CALL is functioning based on
>>>>>> the printed output.  The problem is definitely something to do with my
>>>>>> F77_CALL(dgemm) line.  My understanding is that parameter 10 should be
>>>>>> the leading dimension of the matrix B, which in this case should be
>>>>>> equal to 2, the number of rows in that matrix, which is what I am
>>>>>> doing.  I have also considered that parameter numbering starts at 0,
>>>>>> in which case the incorrect parameter is &zero, but again that seems
>>>>>> correct to me.  All of my reading and research suggests I am doing
>>>>>> everything correctly, so I am somewhat stumped.  Perhaps I am missing
>>>>>> something simple or obvious, as I have never done this before and am
>>>>>> proceeding with only google and the R docs as my guide.  I am
>>>>>> wondering if anybody can see what I'm doing wrong here, or perhaps
>>>>>> something I could do to try to fix it.  Any assistance would be
>>>>>> greatly appreciated.
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Jason Rudy
>>>>>> Graduate Student
>>>>>> Bioinformatics and Medical Informatics Program
>>>>>> San Diego State University
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>
>>>>>
>>>>> --
>>>>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>
>>>
>>
>>
>
>