[Rd] Rmpi on CentOS (64bit)

Patrick Connolly p_connolly at slingshot.co.nz
Fri Mar 5 09:28:08 CET 2010


On Wed, 03-Mar-2010 at 01:57PM -0600, Dirk Eddelbuettel wrote:

[...]

|> You could try to suppress the probe for IB which we did (in the older 1.2.*
|> series of OpenMPI) via
|> 
|>    # Disable the use of InfiniBand
|>    #   btl = ^openib
|>    btl = ^openib
|> 
|> in  /etc/openmpi/openmpi-mca-params.conf

CentOS has it in a completely different place.  I tried that
suggestion, but to no avail.


I looked into "... the availability of the interfaces in the dat.conf
file" which the error message mentioned. 


$ locate dat.conf
/etc/ofed/dat.conf
/etc/ofed/compat-dapl/dat.conf
/usr/share/man/man5/dat.conf.5.gz

(No such files appear in Fedora 11 and there seems to be no
ill-effects).

Are we to assume that it's the second of those that is related to the message?


$ cat /etc/ofed/compat-dapl/dat.conf
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""
OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""
OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""
OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""
OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 1" ""
OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 2" ""
OpenIB-ehca0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ehca0 1" ""
OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""


$ cat /etc/ofed/dat.conf
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1"  ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
$


Does that give anyone any clues as to what could be going on the
message (which went like this)?

librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm
--------------------------------------------------------------------------
WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------



Could there be a modification to Dirk's suggestion that might deal
with it?  (I'm making a last-ditch attempt to avoid using Fedora.)

It's hard to find anything much about CentOS and MPI -- or at least
what people did to get it working.  I found tales of people having
difficulties with Fedora that I didn't have.  I'm not much wiser than
I began.


best

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_  	         Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)  	                      ..... Eleanor Roosevelt
	  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.



More information about the R-devel mailing list