User guide

3InfiniBand
®
Cluster Setup and Administration
Host Environment Setup for MPI
3-40 IB0054606-02 A
Other services may be required by your batch queuing system or user community.
If your system is running the daemon
irqbalance, QLogic recommends turning it
off. Disabling
irqbalance will enable more consistent performance with programs
that use interrupts. Use this command:
# /sbin/chkconfig irqbalance off
See “Erratic Performance” on page D-10 for more information.
Host Environment Setup for MPI
After the QLogic OFED+ Host software and the GNU (GCC) compilers have been
installed on all the nodes, the host environment can be set up for running MPI
programs.
Configuring for ssh
Running MPI programs with the command mpirun on an IB cluster depends, by
default, on secure shell
ssh to launch node programs on the nodes.
To use
ssh, you must have generated Rivest, Shamir, Adleman (RSA) or Digital
Signal Algorithm (DSA) keys, public and private. The public keys must be
distributed and stored on all the compute nodes so that connections to the remote
machines can be established without supplying a password.
You or your administrator must set up the
ssh keys and associated files on the
cluster. There are two methods for setting up ssh on your cluster. The first
method, the
shosts.equiv mechanism, is typically set up by the cluster
administrator. The second method, using
ssh-agent, is more easily
accomplished by an individual user.
Configuring ssh and sshd Using shosts.equiv
This section describes how the cluster administrator can set up ssh and sshd
through the
shosts.equiv mechanism. This method is recommended, provided
that your cluster is behind a firewall and accessible only to trusted users.
NOTE
rsh can be used instead of ssh. To use rsh, set the environment
variable
MPI_SHELL=rsh. See “Environment Variables” on page 4-18 for
information on setting environment variables. Also see “Shell Options”
on page A-6 for information on setting shell options in
mpirun.
rsh has a limit on the number of concurrent connections it can have,
typically 255, which may limit its use on larger clusters.