Specifications
43
High Performance Trading/Algo Speed with Wombat Design and Implementation Guide
OL-15617-01
Appendix B—Building and Configuring Switches
• Focus on your out-of-band (Ethernet) network first. Verify that all of your hosts and switches are
available on the out-of-band network before you bring up the InfiniBand network.
Note Do not try to bring up the cluster using the in-band IPoIB management interfaces.
• Break any given cluster into segments or “pods.” Bringing up a “pod” means bringing up all hosts
connected to a leaf switch that is not logically connected to any core switches. This document
describes the bring-up process in more detail below.
• Keep things in perspective: this process will probably take longer than you anticipate. This cluster
involves numerous devices and two overlapping networks (in-band and out-of-band). Remember
Murphy’s Law: if anything can go wrong, it will. Break the process up into smaller milestones and
approach the network one piece at a time.
Installation Task and Timing Overview
The amount of time and man-power required for installation will vary directly with the size of the cluster.
As an example, a 4500+-node cluster took approximately 8 to 10 man-weeks to bring up the InfiniBand
fabric. However, this example was an unusually challenging scenario because of the following factors:
• All racks were densely populated, with most using 41 of 42 U of space.
• Installation of the leaf switches were done after the racks had been populated with nodes and internal
management cabling, leaving very little free working space.
• Leaf switches were forced to be installed in a manner that greatly limited accessibility, both for
switch racking within the rack and for connecting cables.
As a general guideline, 85 percent of the installation time is spent performing tasks associated with cable
management, including the following:
• Cable labeling
• Connecting all InfiniBand cables to nodes and switches
• Debugging and replacing cables throughout the bring-up process
Note Unexpected issues are certain to arise, and installation complexity is certain to grow with the size of a
cluster, regardless of previous experience or expectations.
The Very First Thing That You Do: Plan
To plan for your cluster bring-up, know everything in Table 11 before you take any action:
Table 11 Planning Requirements
Issue Requirement
Will the fabric be oversubscribed (“blocking) or not
(non-blocking)?
The physical layout of the cluster depends on the
subscription attribute of the fabric, so you must answer this
question before you begin any physical installation.