HP-MPI Version 2.2 for Linux Release Note

HP-MPI V2.2 for Linux Release Note
Known Problems and Workarounds
29
Known Problems and Workarounds
In order to use appfiles on XC ELAN clusters, MPI_USESRUN must be set to 1 and the
appfile must be homogenous.
The SilverStorm uDAPL driver has an accumulating issue. If the system has been
running for more than 24 hours, and a large enough number of applications have been
run, new applications could have problems establishing new uDAPL connections. The
error occurs depending on the usage of the system. If this error occurs, reboot the system.
There is a memory registration problem with the VAPI 3.2 driver. This error occurs if the
application always uses malloc() to get a new buffer for transferring. Even though very
little memory is actually pinned by the application, VAPI reports that it can't pin memory.
Myrinet MX Version 1.0 has a known resource limitation involving outstanding issends. If
more than 128 issends are issued and not yet matched, further MX communication can
hang. The only known workaround is to have your application issue less than 128
unmatched issends at a time.
MPI_IC_ORDER must provide the same definition on every node in a cluster to be effective.
In an XC cluster using srun, the environment variables are automatically propagated by
srun. In appfile mode however, the user must explicitly propagate those environment
variables via -e.
% mpirun -e MPI_IC_ORDER="vapi:TCP" -f appfile -prot
Interval timer functionality used by HP-MPI on XC can conflict with gprof data collection
phase requirements. Set the following two environment variables to workaround this
issue.
% export MPI_FLAGS=s0
% export GMON_OUT_PREFIX=/tmp/app_name
In the above example, setting MPI_FLAGS disables HP-MPI’s conflicting use of interval
timers. Refer to the mpienv(1) man page for descriptions of MPI_FLAG settings. Note that
this setting also disables message progression monitoring, so use with well-behaved
programs only.
In the above example, the second setting causes gprof data collection files to be named
/tmp/app_name.PID (where PID is the process ID number). The prefix is set arbitrarily
and makes the file unique in cases where the same PID is given on different nodes.
At the time of this release, the Mellanox InfiniBand driver has issues with buffers sharing
pages when fork( ) is used. Pinned (locked in memory) pages are normally marked
copy-on-write during a fork. If a page is pinned before a fork and subsequently written to