User Manual

PerformanceRev 2.1-1.0.6
Mellanox Technologies
134
Example for supported system:
Example for unsupported system:
7.2.6.1.1 Improving Application Performance on Remote NUMA Node
Verbs API applications that mostly use polling, will have an impact when using the remote
NUMA node.
libmlx4 has a build-in enhancement that recognizes an application that is pinned to a remote
NUMA node and activates a flow that improves the out-of-the-box latency and throughput.
However, the NUMA node recognition must be enabled as described in section “Tuning for
Intel® Sandy Bridge Platform” on page 133.
In systems which do not support SLIT, the following environment variable should be applied:
Example for local NUMA node which its cores are 0-7:
Additional modification can apply to impact this feature by changing the following environment
variable:
7.2.6.2 Tuning for AMD® Architecture
On AMD architecture there is a difference between a 2 socket system and a 4 socket system.
With a 2 socket system the PCIe adapter will be connected to socket 0 (nodes 0,1).
With a 4 socket system the PCIe adapter will be connected either to socket 0 (nodes 0,1)
or to socket 3 (nodes 6,7).
7.2.6.3 Recognizing NUMA Node Cores
To recognize NUMA node cores, run the following command:
Example:
# cat /sys/class/net/eth3/device//numa_node
0
# cat /sys/class/net/ib0/device/numa_node
-1
MLX4_LOCAL_CPUS=0x[bit mask of local NUMA node]
MLX4_LOCAL_CPUS=0xff
MLX4_STALL_NUM_LOOP=[integer] (default: 400)
The default value is optimized for most applications. However, several applications
might benefit from increasing/decreasing this value.
# cat /sys/devices/system/node/node[X]/cpulist | cpumap
# cat /sys/devices/system/node/node1/cpulist
1,3,5,7,9,11,13,15
# cat /sys/devices/system/node/node1/cpumap
0000aaaa