User Manual
OpenSM – Subnet ManagerRev 2.2-1.0.1
Mellanox Technologies
176
8.5 Routing Algorithms
OpenSM offers six routing engines:
1. “Min Hop Algorithm”
Based on the minimum hops to each node where the path length is optimized.
2. “UPDN Algorithm”
Based on the minimum hops to each node, but it is constrained to ranking rules. This algorithm
should be chosen if the subnet is not a pure Fat Tree, and a deadlock may occur due to a loop in
the subnet.
3. “Fat-tree Routing Algorithm”
This algorithm optimizes routing for a congestion-free “shift” communication pattern. It should
be chosen if a subnet is a symmetrical Fat Tree of various types, not just a K-ary-N-Tree: non-
constant K, not fully staffed, and for any CBB ratio. Similar to UPDN, Fat Tree routing is con
-
strained to ranking rules.
4. “LASH Routing Algorithm”
Uses InfiniBand virtual layers (SL) to provide deadlock-free shortest-path routing while also
distributing the paths between layers. LASH is an alternative deadlock-free, topology-agnostic
routing algorithm to the non-minimal UPDN algorithm. It avoids the use of a potentially con
-
gested root node.
5. “DOR Routing Algorithm”
Based on the Min Hop algorithm, but avoids port equalization except for redundant links
between the same two switches. This provides deadlock free routes for hypercubes when the
fabric is cabled as a hypercube and for meshes when cabled as a mesh.
6. “Torus-2QoS Routing Algorithm”
Based on the DOR Unicast routing algorithm specialized for 2D/3D torus topologies. Torus-
2QoS provides deadlock-free routing while supporting two quality of service (QoS) levels.
Additionally,
it can route around multiple failed fabric links or a single failed fabric switch without introduc-
ing deadlocks, and without changing path SLvalues granted before the failure.
OpenSM provides an optional unicast routing cache (enabled by -A or --ucast_cache options).
When enabled, unicast routing cache prevents routing recalculation (which is a heavy task in a
large cluster) when there was no topology change detected during the heavy sweep, or when the
topology change does not require new routing calculation, e.g. when one or more CAs/RTRs/
leaf switches going down, or one or more of these nodes coming back after being down. A very
common case that is handled by the unicast routing cache is host reboot, which otherwise
would cause two full routing recalculations: one when the host goes down, and the other when
the host comes back online.
OpenSM also supports a file method which can load routes from a table – see Modular Routing
Engine below.
The basic routing algorithm is comprised of two stages:
1. MinHop matrix calculation. How many hops are required to get from each port to each LID?
The algorithm to fill these tables is different if you run standard (min hop) or Up/Down. For
standard routing, a "relaxation" algorithm is used to propagate min hop from every destina
-
tion LID through neighbor switches. For Up/Down routing, a BFS from every target is used.