HP-MPI Version 2.2.7 for Linux Release Note

ManualsBrandsHP ManualsSoftwareHP-UX Performance Tools

• uDAPL may experience issues on multi-card systems. uDAPL will function on the second

card if it is on a separate subnet. But, the second card will not work if it is on the same subnet

as the first card. For multi-card systems set:

/sbin/sysctl -w net.ipv4.conf.all.arp_ignore=2

• The nofile limit on large linux clusters needs to be increased in

/etc/security/limits.conf

* soft nofile 1024

For larger clusters, HP recommends a setting of at least:

— 2048 for clusters of 1900 cores or fewer

— 4096 for clusters of 3800 cores or fewer

— 8192 for clusters of 7600 cores or fewer

— And so on

• To use appfiles on HP XC Quadrics clusters, set MPI_USESRUN=1. The appfile can only

differ in host name and rank count.

• On Quadrics interconnected clusters, the repeated use of MPI_Bcast within a tight loop

can cause an application to fail with the following Elan trap message:

ELAN TRAP - 0 0 CPROC - Bad Trap

Status=lbb40005 CommandProcSendTransExpected Command=200000201

Setting the environment variable LIBELAN_GROUP_SANDF=0 disables the latest “Store and

Forward” broadcast optimization from Quadrics while preserving all the other optimized

collectives.

• The SilverStorm™ uDAPL driver has an accumulating issue. If the system has been running

for more than 24 hours, and a large enough number of applications have been run, new

applications might have problems establishing new uDAPL connections. The error occurs

depending on the usage of the system. If this error occurs, reboot the system.

• Some older versions of Myrinet MX have a known resource limitation involving outstanding

MPI_Issend() calls. If more than 128 MPI_Issend() calls are issued and not yet matched,

further MX communication can hang. The only known workaround is to have your

application issue less than 128 unmatched MPI_Issend() calls at a time. This limitation is

known to be fixed in versions 1.1.8 and later.

• When a foreground HP-MPI job is run from a shell window, if the shell is terminated, the

shell sends signal SIGHUP to the mpirun process and its underlying ssh processes, thus

killing the entire job.

When a background HP-MPI job is run and the shell is terminated, the job might continue

depending on the actual shell used. For /bin/bash, the job is killed. For /bin/sh and

/bin/ksh, the job continues. If nohup is used when launching the job, only background

ksh jobs can continue. This behavior might vary depending on your system.

• Interval timer functionality used by HP-MPI on HP XC can conflict with gprof data collection

phase requirements. Set the following two environment variables to workaround this issue:

% export MPI_FLAGS=s0

% export GMON_OUT_PREFIX=/tmp/app_name

In the above example, setting MPI_FLAGS disables the HP-MPI conflicting use of interval

timers. Refer to the mpienv(1) manpage for descriptions of MPI_FLAG settings. Because this

setting also disables message progression monitoring, use it with well-behaved programs

only.

In the above example, the second setting causes gprof data collection files to be named

/tmp/app_name.PID (where PID is the process ID number). The prefix is set arbitrarily

and makes the file unique in cases where the same PID is given on different nodes.