HP XC System Software Administration Guide Version 4.0

Table Of Contents
http://www.openfabrics.org/
To determine if your HP XC system is configured properly, perform the following steps on any
node on which you suspect a problem:
1. Use the lspci command to ensure that your system has InfiniBand boards installed, and
that the operating system detects them:
[root@n1 ~]# lspci -v
44:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 2 0)
Subsystem: Hewlett-Packard Company: Unknown device 170a
Flags: bus master, fast devsel, latency 0, IRQ 233
Memory at fdf00000 (64-bit, non-prefetchable) [size=1M]
Memory at fd000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 2
Capabilities: [48] Vital Product Data
Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable -
Capabilities: [84] MSI-X: Enable- Mask- TabSize=32
Capabilities: [60] Express Endpoint IRQ 0
The information reported may differ, depending on the type of InfiniBand board used, but
you should see output for a board with the string "InfiniBand".
If you do not see output for an InfiniBand board, verify the following:
An InfiniBand board is installed.
The board is seated properly.
The board functions properly. You might have to swap it with an identical working
board to verify this.
2. Run the ibv_devinfo command to verify that the InfiniBand system interconnect is
connected and that it is operating correctly:
[root@n1 ~]# ibv_devinfo
hca_id:mthca0
fw_ver:1.2.0
node_guid:0017:08ff:ffd1:33b4
sys_image_guid:0017:08ff:ffd1:33b7
vendor_id:0x02c9
vendor_part_id:25204
hw_ver:0xA0
board_id:HP_0010000001
phys_port_cnt:1
port:1
state:PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu:2048 (4)
sm_lid:1
port_lid: 3
port_lmc: 0x00
IMPORTANT: The fw_ver parameter indicates the firmware version. The InfiniBand board
firmware should be the latest version available with your software release, and must be at
least as recent as the minimum firmware versions listed in the HP XC master firmware list:
http://docs.hp.com/en/linuxhpc.html
When examining the ibv_devinfo command output, you should see a PORT_ACTIVE
state indication for at least one port of the InfiniBand board. If you see PORT_DOWN or
PORT_INITIALIZE indication, this means that the InfiniBand board is not communicating
properly with the InfiniBand switch. This could be due to a missing cable, or poor cable
connection to the switch, that the InfiniBand switch is not functioning correctly, or that an
InfiniBand Subnet Manager is not running properly. Troubleshoot the cables or switch as
necessary using information in the InfiniBand vendor's documentation, as well as HP
InfiniBand Hardware documentation.
21.4 System Interconnect Troubleshooting 257