HP-MPI Version 2.2.5.1 for Linux Release Note

HP-MPI V2.2.5.1 for Linux Release Note

Known Problems and Workarounds

• The nofile limit on large linux clusters needs to be increased in

/etc/security/limits.conf

% soft nofile 1024

For larger clusters, we recommend

— 2048 for clusters of 1900 cores or less

— 4096 for clusters of 3800 cores or less

— 8192 for clusters of 7600 cores or less

—etc.

• In order to use appfiles on HP XC Elan clusters, MPI_USESRUN must be set to 1 and the

line of the appfile may only differ in host name and rank count.

• On Quadrics interconnected clusters, the repeated use of MPI_Bcast within a tight loop

may cause an application to fail with the following Elan trap message:

ELAN TRAP - 0 - CPROC - Bad Trap

Status=lbb40005 CommandProcSendTransExpected Command=200000201

Setting the environment variable LIBELAN_GROUP_SANDF=0 will disable the latest “Store

and Forward” broadcast optimization from Quadrics while preserving all the other

optimized collectives.

• The SilverStorm uDAPL driver has an accumulating issue. If the system has been

running for more than 24 hours, and a large enough number of applications have been

run, new applications could have problems establishing new uDAPL connections. The

error occurs depending on the usage of the system. If this error occurs, reboot the system.

• There is a memory registration problem with the VAPI 3.2 driver. This error occurs if the

application always uses malloc() to get a new buffer for transferring. Even though very

little memory is actually pinned by the application, VAPI reports that it can't pin memory.

• Some older versions of Myrinet MX have a known resource limitation involving

outstanding MPI_Issends. If more than 128 MPI_Issends are issued and not yet matched,

further MX communication can hang. The only known workaround is to have your

application issue less than 128 unmatched MPI_Issends at a time. This limitation is

known to be fixed in versions 1.1.8 and later.

• When a foreground HP-MPI job is run from a shell window, if the shell is terminated, the

shell will send signal SIGHUP to the mpirun process and its underlying ssh processes,

thus killing the entire job.