User Manual
Rev 2.1-1.0.6
Mellanox Technologies
153
8.5 Routing Algorithms
OpenSM offers six routing engines:
1. “Min Hop Algorithm”
Based on the minimum hops to each node where the path length is optimized.
2. “UPDN Algorithm”
Based on the minimum hops to each node, but it is constrained to ranking rules. This algorithm should
be chosen if the subnet is not a pure Fat Tree, and a deadlock may occur due to a loop in the subnet.
3. “Fat-tree Routing Algorithm”
This algorithm optimizes routing for a congestion-free “shift” communication pattern. It should be
chosen if a subnet is a symmetrical Fat Tree of various types, not just a K-ary-N-Tree: non-constant
K, not fully staffed, and for any CBB ratio. Similar to UPDN, Fat Tree routing is constrained to rank-
ing rules.
4. “LASH Routing Algorithm”
Uses InfiniBand virtual layers (SL) to provide deadlock-free shortest-path routing while also distrib-
uting the paths between layers. LASH is an alternative deadlock-free, topology-agnostic routing algo-
rithm to the non-minimal UPDN algorithm. It avoids the use of a potentially congested root node.
5. “DOR Routing Algorithm”
Based on the Min Hop algorithm, but avoids port equalization except for redundant links between the
same two switches. This provides deadlock free routes for hypercubes when the fabric is cabled as a
hypercube and for meshes when cabled as a mesh.
6. “Torus-2QoS Routing Algorithm”
Based on the DOR Unicast routing algorithm specialized for 2D/3D torus topologies. Torus-2QoS
provides deadlock-free routing while supporting two quality of service (QoS) levels. Additionally,
it can route around multiple failed fabric links or a single failed fabric switch without introducing
deadlocks, and without changing path SLvalues granted before the failure.
OpenSM provides an optional unicast routing cache (enabled by -A or --ucast_cache options). When
enabled, unicast routing cache prevents routing recalculation (which is a heavy task in a large cluster)
when there was no topology change detected during the heavy sweep, or when the topology change
does not require new routing calculation, e.g. when one or more CAs/RTRs/leaf switches going down,
or one or more of these nodes coming back after being down. A very common case that is handled by
the unicast routing cache is host reboot, which otherwise would cause two full routing recalculations:
one when the host goes down, and the other when the host comes back online.
OpenSM also supports a file method which can load routes from a table – see Modular Routing
Engine below.
The basic routing algorithm is comprised of two stages:
1. MinHop matrix calculation. How many hops are required to get from each port to each LID?
The algorithm to fill these tables is different if you run standard (min hop) or Up/Down. For
standard routing, a "relaxation" algorithm is used to propagate min hop from every destina
-
tion LID through neighbor switches. For Up/Down routing, a BFS from every target is used.
The BFS tracks link direction (up or down) and avoid steps that will perform up after a down
step was used.
2. Once MinHop matrices exist, each switch is visited and for each target LID a decision is
made as to what port should be used to get to that LID. This step is common to standard and