User guide

3InfiniBand
®
Cluster Setup and Administration
Performance Settings and Management Tips
3-28 IB0054606-02 A
In the rare case that the node has more than 64 cores, and it is desired to run MPI
on more than 64 cores, then two HCAs are required and settings can be made,
using the rules in Table 3-2, as though half the cores were assigned to each HCA.
AMD CPU Systems
To improve IPoIB and other verbs-based throughput performance, on AMD CPU
systems, QLogic recommends setting pcie_caps=0x51 numa_aware=1 as
modprobe configuration file parameters. For example, the module parameter line
in the modprobe configuration file should include the following for AMD Opteron
CPUs:
options ib_qib pcie_caps=0x51 numa_aware=1
On AMD systems, the pcie_caps=0x51 setting will result in a line of the
lspci -vv output associated with the QLogic HCA reading in the "DevCtl"
section:
MaxPayload 128 bytes, MaxReadReq 4096 bytes.
AMD Interlagos CPU Systems
With AMD Interlagos (Opteron 6200 Series) CPU systems, better performance
will be obtained if, on single-HCA systems, the HCA is put in a PCIe slot closest to
Socket number 1. You can typically find out which slots these are by looking at the
schematics in the manual for your motherboard. (There is currently a BIOS or
kernel problem which implies that no NUMA topology information is available from
the kernel.)
To obtain top “Turbo boosts” of up to 1GHz in clock rate, when running on half the
cores of a node, AMD recommends enabling the C6 C-state in the BIOS. Some
applications (but certainly not all) run better when running on half the cores or a
Interlagos node (on every other core, one per Bulldozer module). QLogic
recommends enabling this C-state in the BIOS.
Intel CPU Systems
Typical tuning for recent Intel CPUs
For recent Intel CPUs (code-named Sandy Bridge, Westmere or Nehalem), set
the following BIOS parameters:
Disable all C-States.
Disable Intel Hyper-Threading technology
a
1 is the default setting, so if the table recommends '1', krcvqs does not need to be set.