HP-MPI V2.3 for Linux Release Note

The initial release of OFED 1.2 contains a bug that causes the memory pinning
function to fail after certain patterns of malloc and free. The symptom, which
is visible from HP-MPI, might be any of several error messages such as:
> prog.x: Rank 0:1: MPI_Get: Unable to pin memory for put/get
This bug has already been fixed in OFED 1.3, but if you are running with the initial
release of OFED 1.2, the only workaround is to set MPI_IBV_NO_FORK_SAFE=1.
When upgrading to OFED 1.2 from older versions, the installation script might
not stop the previous OFED version before uninstalling it. Therefore, stop the old
OFED stack before upgrading to OFED 1.2. For example:
/etc/init.d/openibd stop
The nofile limit on large Linux clusters needs to be increased in /etc/security/
limits.conf
* soft nofile 1024
For larger clusters, HP recommends a setting of at least:
2048 for clusters of 1900 cores or fewer
4096 for clusters of 3800 cores or fewer
8192 for clusters of 7600 cores or fewer
To use appfiles on HP XC Quadrics clusters, set MPI_USESRUN=1. The appfile can
only differ in host name and rank count.
On Quadrics interconnected clusters, the repeated use of MPI_Bcast within a
tight loop can cause an application to fail with the following Elan trap message:
ELAN TRAP - 0 0 CPROC - Bad Trap
Status=lbb40005 CommandProcSendTransExpected Command=200000201
Setting the environment variable LIBELAN_GROUP_SANDF=0 disables the latest
“Store and Forward” broadcast optimization from Quadrics while preserving all
the other optimized collectives.
Some older versions of Myrinet MX have a known resource limitation involving
outstanding MPI_Issend() calls. If more than 128 MPI_Issend() calls are
issued and not yet matched, further MX communication can hang. The only known
workaround is to have your application issue less than 128 unmatched
MPI_Issend() calls at a time. This limitation is fixed in versions 1.1.8 and later.
When a foreground HP-MPI job is run from a shell window, if the shell is
terminated, the shell sends signal SIGHUP to the mpirun process and its underlying
ssh processes, thus killing the entire job.
When a background HP-MPI job is run and the shell is terminated, the job might
continue depending on the actual shell used. For /bin/bash, the job is killed. For
/bin/sh and /bin/ksh, the job continues. If nohup is used when launching the
job, only background ksh jobs can continue. This behavior might vary depending
on your system.
30 Known Problems and Workarounds