HP-MPI V2.3 for Linux Release Note

• The initial release of OFED 1.2 contains a bug that causes the memory pinning

function to fail after certain patterns of malloc and free. The symptom, which

is visible from HP-MPI, might be any of several error messages such as:

> prog.x: Rank 0:1: MPI_Get: Unable to pin memory for put/get

This bug has already been fixed in OFED 1.3, but if you are running with the initial

release of OFED 1.2, the only workaround is to set MPI_IBV_NO_FORK_SAFE=1.

• When upgrading to OFED 1.2 from older versions, the installation script might

not stop the previous OFED version before uninstalling it. Therefore, stop the old

OFED stack before upgrading to OFED 1.2. For example:

/etc/init.d/openibd stop

• The nofile limit on large Linux clusters needs to be increased in /etc/security/

limits.conf

* soft nofile 1024

For larger clusters, HP recommends a setting of at least:

— 2048 for clusters of 1900 cores or fewer

— 4096 for clusters of 3800 cores or fewer

— 8192 for clusters of 7600 cores or fewer

• To use appfiles on HP XC Quadrics clusters, set MPI_USESRUN=1. The appfile can

only differ in host name and rank count.

• On Quadrics interconnected clusters, the repeated use of MPI_Bcast within a

tight loop can cause an application to fail with the following Elan trap message:

ELAN TRAP - 0 0 CPROC - Bad Trap

Status=lbb40005 CommandProcSendTransExpected Command=200000201

Setting the environment variable LIBELAN_GROUP_SANDF=0 disables the latest

“Store and Forward” broadcast optimization from Quadrics while preserving all

the other optimized collectives.

• Some older versions of Myrinet MX have a known resource limitation involving

outstanding MPI_Issend() calls. If more than 128 MPI_Issend() calls are

issued and not yet matched, further MX communication can hang. The only known

workaround is to have your application issue less than 128 unmatched

MPI_Issend() calls at a time. This limitation is fixed in versions 1.1.8 and later.

• When a foreground HP-MPI job is run from a shell window, if the shell is

terminated, the shell sends signal SIGHUP to the mpirun process and its underlying

ssh processes, thus killing the entire job.

When a background HP-MPI job is run and the shell is terminated, the job might

continue depending on the actual shell used. For /bin/bash, the job is killed. For

/bin/sh and /bin/ksh, the job continues. If nohup is used when launching the

job, only background ksh jobs can continue. This behavior might vary depending

on your system.

30 Known Problems and Workarounds