[Rd] accelerating matrix multiply

Cohn, Robert S robert.s.cohn at intel.com
Sat Jan 7 17:41:42 CET 2017

I am using R to multiply some large (30k x 30k double) matrices on a 64 core machine (xeon phi).  I added some timers to src/main/array.c to see where the time is going. All of the time is being spent in the matprod function, most of that time is spent in dgemm. 15 seconds is in matprod in some code that is checking if there are NaNs.

> system.time (C <- B %*% A)
nancheck: wall time 15.240282s
  dgemm: wall time 43.111064s
matprod: wall time 58.351572s
    user   system  elapsed 
2710.154   20.999   58.398

The NaN checking code is not being vectorized because of the early exit when NaN is detected:

	/* Don't trust the BLAS to handle NA/NaNs correctly: PR#4582
	 * The test is only O(n) here.
	for (R_xlen_t i = 0; i < NRX*ncx; i++)
	    if (ISNAN(x[i])) {have_na = TRUE; break;}
	if (!have_na)
	    for (R_xlen_t i = 0; i < NRY*ncy; i++)
		if (ISNAN(y[i])) {have_na = TRUE; break;}

I tried deleting the 'break'. By inspecting the asm code, I verified that the loop was not being vectorized before, but now is vectorized. Total time goes down:

system.time (C <- B %*% A)
nancheck: wall time 1.898667s
  dgemm: wall time 43.913621s
matprod: wall time 45.812468s
   user   system  elapsed 
2727.877   20.723   45.859

The break accelerates the case when there is a NaN, at the expense of the much more common case when there isn't a NaN. If a NaN is detected, it doesn't call dgemm and calls its own matrix multiply, which makes the NaN check time insignificant so I doubt the early exit provides any benefit.

I was a little surprised that the O(n) NaN check is costly compared to the O(n**2) dgemm that follows. I think the reason is that nan check is single thread and not vectorized, and my machine can do 2048 floating point ops/cycle when you consider the cores/dual issue/8 way SIMD/muladd, and the constant factor will be significant for even large matrices.

Would you consider deleting the breaks? I can submit a patch if that will help. Thanks.


More information about the R-devel mailing list