Specifications

43
High Performance Trading/Algo Speed with Wombat Design and Implementation Guide
OL-15617-01
Appendix B—Building and Configuring Switches
Focus on your out-of-band (Ethernet) network first. Verify that all of your hosts and switches are
available on the out-of-band network before you bring up the InfiniBand network.
Note Do not try to bring up the cluster using the in-band IPoIB management interfaces.
Break any given cluster into segments or “pods.” Bringing up a “pod” means bringing up all hosts
connected to a leaf switch that is not logically connected to any core switches. This document
describes the bring-up process in more detail below.
Keep things in perspective: this process will probably take longer than you anticipate. This cluster
involves numerous devices and two overlapping networks (in-band and out-of-band). Remember
Murphys Law: if anything can go wrong, it will. Break the process up into smaller milestones and
approach the network one piece at a time.
Installation Task and Timing Overview
The amount of time and man-power required for installation will vary directly with the size of the cluster.
As an example, a 4500+-node cluster took approximately 8 to 10 man-weeks to bring up the InfiniBand
fabric. However, this example was an unusually challenging scenario because of the following factors:
All racks were densely populated, with most using 41 of 42 U of space.
Installation of the leaf switches were done after the racks had been populated with nodes and internal
management cabling, leaving very little free working space.
Leaf switches were forced to be installed in a manner that greatly limited accessibility, both for
switch racking within the rack and for connecting cables.
As a general guideline, 85 percent of the installation time is spent performing tasks associated with cable
management, including the following:
Cable labeling
Connecting all InfiniBand cables to nodes and switches
Debugging and replacing cables throughout the bring-up process
Note Unexpected issues are certain to arise, and installation complexity is certain to grow with the size of a
cluster, regardless of previous experience or expectations.
The Very First Thing That You Do: Plan
To plan for your cluster bring-up, know everything in Table 11 before you take any action:
Table 11 Planning Requirements
Issue Requirement
Will the fabric be oversubscribed (“blocking) or not
(non-blocking)?
The physical layout of the cluster depends on the
subscription attribute of the fabric, so you must answer this
question before you begin any physical installation.