Administrator's Guide

As soon as you start testnodes.pl -gpu a test is launched to check all nodes for the
presence of accelerator cards (GPUs). If any GPUs are detected and they are responsive to
communication, the node will be marked by adding /g<number of nodes> to the node
name in the nodes window. In the example below, each node has three detected and
responsive GPUs.
4. Compare the number of GPUs indicated in the nodes monitoring window to the actual number
of GPUs for each node. Any discrepancies indicate a problem with GPUs on that node.
5. Deselect any nodes that do not have GPUs.
6. Select Verify and use the generated report for the following checklist.
Make sure all GPUs are listed for each node.
Verify the Model numbers.
Verify the Video BIOS.
The Link Speed can be reported as either 2.5, 5, or UNKNOWN. A report of 5 or
UNKNOWN indicates the GPU is running at Gen2 speed and is acceptable. A value of 2.5
might indicate the GPU is not properly configured. However this test is timing sensitive,
so it is recommended you retest any nodes reporting 2.5. If the test consistently reports
2.5, the GPU should be re-seated and the test repeated. If all the GPUs report 2.5, there
might be a BIOS setting error.
28 Cluster Test procedure as recommended by HP