User guide
3–InfiniBand
®
Cluster Setup and Administration
Performance Settings and Management Tips
3-30 IB0054606-02 A
High Risk Tuning for Intel Harpertown CPUs
For tuning the Harpertown generation of Intel Xeon CPUs that entails a higher risk
factor, but includes a bandwidth benefit, the following can be applied:
For nodes with Intel Harpertown, Xeon 54xx CPUs, you can add
pcie_caps=0x51 and pcie_coalesce=1 to the modprobe.conf file. For
example:
options ib_qib pcie_caps=0x51 pcie_coalesce=1
If the following problem is reported by syslog, a typical diagnostic can be
performed, which is described in the following paragraphs:
[PCIe Poisoned TLP][Send DMA memory read]
Another potential issue is that after starting openibd, messages such as the
following appear on the console:
Message from syslogd@st2019 at Nov 14 16:55:02 ...
kernel:Uhhuh. NMI received for unknown reason 3d on CPU 0
After this happens, you may also see the following message in the syslog:
Mth dd hh:mm:ss st2019 kernel: ib_qib 0000:0a:00.0:
infinipath0:
Fatal Hardware Error, no longer usable, SN AIB1013A43727
These problems typically occur on the first run of an MPI program running over
the PSM transport or immediately after the link becomes active. The adapter will
be unusable after this situation until the system is rebooted. To resolve this issue
try the following solutions in order:
Remove pcie_coalesce=1
Restart openibd and try the MPI program again
Remove both
pcie_caps=0x51 and pcie_coalesce=1 options from the
ib_qib line in modprobe.conf file and reboot the system
NOTE
Removing both options will technically avoid the problem but can result
in an unnecessary performance decrease. If the system has already
failed with the above diagnostic it will need to be rebooted. Note that in
modprobe.conf file all options for a particular kernel module must be on
the same line and not on repeated options ib_qib lines.