HP-MPI Version 2.3.1 for Linux Release Note

Table Of Contents
4 Known Issues and Workarounds
The following items are known issues and workarounds.
4.1 Running on iWarp Hardware
When running on iWARP hardware, you might see messages similar the following
when applications exit:
disconnect: ID 0x2b65962b2b10 ret 22
This is a debugging message that prints erroneously from the uDAPL library and
can be ignored. The message can be completely suppressed by passing the -e
DAPL_DBG_TYPE=0 option to mpirun. Alternatively, you can set
DAPL_DBG_TYPE=0 in the $MPI_ROOT/etc/hpmpi.conf file to avoid having
to pass the option on the mpirun command line.
Users might see the following error during launch of HP-MPI applications on
Chelsio iWARP hardware:
Rank 0:0: MPI_Init: dat_evd_wait()1 unexpected event number 16392
Rank 0:0: MPI_Init: MPI BUG: Processes cannot connect to rdma device
MPI Application rank 0 exited before MPI_Finalize() with status 1
To prevent these errors, Chelsio recommends passing the peer2peer=1 parameter
to the iw_cxgb3 kernel module. This is accomplished by running the following
commands as root on all nodes:
# echo "1" > /sys/module/iw_cxgb3/parameters/peer2peer
# echo "options iw_cxgb3 peer2peer=1" >> /etc/modprobe.conf
The second command is optional and makes the setting persist across a system
reboot.
Users of iWARP hardware might see errors similar to the following:
dapl async_event QP (0x2b27fdc10d30) ERR 1 dapl_evd_qp_async_error_callback() IB async QP err
- ctx=0x2b27fdc10d30
Previous versions of HP-MPI required passing -e MPI_UDAPL_MSG1=1 on some
iWARP hardware. As of HP-MPI V2.3, no iWARP implementations are known to
require this setting, and you must remove it from all scripts unless otherwise
instructed.
4.2 Running with Chelsio uDAPL
At the time of this release, Chelsio uDAPL has a limitation that one-sided operations
are only implemented for off-host transfers, not data transfers, between ranks on the
same host. On a Chelsio system, the symptom for this problem resembles the following:
dapl_cma_connect: rdma_connect ERR -1 Function not implemented
4.1 Running on iWarp Hardware 25