User Manual

Rev 2.1-1.0.6
Mellanox Technologies
163
Two things are notable about this master spanning tree. First, assuming the x dateline was
between x=5 and x=0, this spanning tree has a branch that crosses the dateline. However, just as
for unicast, crossing a dateline on a 1D ring (here, the ring for y=2) that is broken by a failure
cannot contribute to a torus credit loop. Second, this spanning tree is no longer optimal even for
multicast groups that encompass the entire fabric. That, unfortunately, is a compromise that must
be made to retain the other desirable properties of torus-2QoS routing. In the event that a single
switch fails, torus-2QoS will generate a master spanning tree that has no "extra" turns by appro-
priately selecting a root switch. In the 2D 6x5 torus example, assume now that the switch at
(3,2), i.e. the root for a pristine fabric, fails. Torus-2QoS will generate the following master
spanning tree for that case:
Assuming the y dateline was between y=4 and y=0, this spanning tree has a branch that crosses a
dateline. However, again this cannot contribute to credit loops as it occurs on a 1D ring (the ring
for x=3) that is broken by a failure, as in the above example.
8.5.7.3 Torus Topology Discovery
The algorithm used by torus-2QoS to construct the torus topology from the undirected graph rep-
resenting the fabric requires that the radix of each dimension be configured via torus-2QoS.conf.
It also requires that the torus topology be "seeded"; for a 3D torus this requires configuring four
switches that define the three coordinate directions of the torus. Given this starting information,
the algorithm is to examine the cube formed by the eight switch locations bounded by the corners
(x,y,z) and (x+1,y+1,z+1). Based on switches already placed into the torus topology at some of
these locations, the algorithm examines 4-loops of interswitch links to find the one that is consis-
tent with a face of the cube of switch locations, and adds its swiches to the discovered topology
in the correct locations.
Because the algorithm is based on examining the topology of 4-loops of links, a torus with one or
more radix-4 dimensions requires extra initial seed configuration. See torus-2QoS.conf(5) for
details. Torus-2QoS will detect and report when it has insufficient configuration for a torus with
radix-4 dimensions.
In the event the torus is significantly degraded, i.e., there are many missing switches or links, it
may happen that torus-2QoS is unable to place into the torus some switches and/or links that
were discovered in the fabric, and will generate a warning in that case. A similar condition