HP-MPI Version 2.2.5.1 for Linux Release Note
HP-MPI V2.2.5.1 for Linux Release Note
Known Problems and Workarounds
15
• The nofile limit on large linux clusters needs to be increased in
/etc/security/limits.conf
% soft nofile 1024
For larger clusters, we recommend
— 2048 for clusters of 1900 cores or less
— 4096 for clusters of 3800 cores or less
— 8192 for clusters of 7600 cores or less
—etc.
• In order to use appfiles on HP XC Elan clusters, MPI_USESRUN must be set to 1 and the
line of the appfile may only differ in host name and rank count.
• On Quadrics interconnected clusters, the repeated use of MPI_Bcast within a tight loop
may cause an application to fail with the following Elan trap message:
ELAN TRAP - 0 - CPROC - Bad Trap
Status=lbb40005 CommandProcSendTransExpected Command=200000201
Setting the environment variable LIBELAN_GROUP_SANDF=0 will disable the latest “Store
and Forward” broadcast optimization from Quadrics while preserving all the other
optimized collectives.
• The SilverStorm uDAPL driver has an accumulating issue. If the system has been
running for more than 24 hours, and a large enough number of applications have been
run, new applications could have problems establishing new uDAPL connections. The
error occurs depending on the usage of the system. If this error occurs, reboot the system.
• There is a memory registration problem with the VAPI 3.2 driver. This error occurs if the
application always uses malloc() to get a new buffer for transferring. Even though very
little memory is actually pinned by the application, VAPI reports that it can't pin memory.
• Some older versions of Myrinet MX have a known resource limitation involving
outstanding MPI_Issends. If more than 128 MPI_Issends are issued and not yet matched,
further MX communication can hang. The only known workaround is to have your
application issue less than 128 unmatched MPI_Issends at a time. This limitation is
known to be fixed in versions 1.1.8 and later.
• When a foreground HP-MPI job is run from a shell window, if the shell is terminated, the
shell will send signal SIGHUP to the mpirun process and its underlying ssh processes,
thus killing the entire job.