Service Manual

Changes to BGP Multipath
When the system becomes active after a fast-boot restart, a change has been made to the BGP multipath and ECMP behavior.
The system delays the computation and installation of additional paths to a destination into the BGP routing information base
(RIB) and forwarding table for a certain period of time. Additional paths, if any, are automatically computed and installed
without the need for any manual intervention in any of the following conditions:
After 30 seconds of the system returning online after a restart
After all established peers have synchronized with the restarting system
A combination of the previous two conditions
One possible impact of this behavior change is that if the amount of traffic to a destination is higher than the volume of traffic
that can be carried over one path, a portion of that traffic might be dropped for a short duration (30-60 seconds) after the
system comes up.
Delayed Installation of ECMP Routes Into BGP
The current FIB component of Dell Networking OS has some inherent inefficiencies when handling a large number of ECMP
routes (i.e., routes with multiple equal-cost next hops). To circumvent this for the configuration of fast boot, changes are made
in BGP to delay the installation of ECMP routes. This is done only if the system comes up through a fast boot reload. The BGP
route selection algorithm only selects one best path to each destination and delays installation of additional ECMP paths until a
minimum of 30 seconds has elapsed from the time the first BGP peer is established. Once this time has elapsed, all routes in
the BGP RIB are processed for additional paths.
While the above change will ensure that at least one path to each destination gets into the FIB as quickly as possible, it does
prevent additional paths from being used even if they are available. This downside has been deemed to be acceptable.
RDMA Over Converged Ethernet (RoCE) Overview
This functionality is supported on the platform.
RDMA is a technology that a virtual machine (VM) uses to directly transfer information to the memory of another VM, thus
enabling VMs to be connected to storage networks. With RoCE, RDMA enables data to be forwarded without passing through
the CPU and the main memory path of TCP/IP. In a deployment that contains both the RoCE network and the normal IP
network on two different networks, RRoCE combines the RoCE and the IP networks and sends the RoCE frames over the IP
network. This method of transmission, called RRoCE, results in the encapsulation of RoCE packets to IP packets. RRoCE sends
Infini Band (IB) packets over IP. IB supports input and output connectivity for the internet infrastructure. Infini Band enables the
expansion of network topologies over large geographical boundaries and the creation of next-generation I/O interconnect
standards in servers.
When a storage area network (SAN) is connected over an IP network, the following conditions must be satisfied:
Faster Connectivity: QoS for RRoCE enables faster and lossless nature of disk input and output services.
Lossless connectivity: VMs require the connectivity to the storage network to be lossless always. When a planned upgrade
of the network nodes happens, especially with top-of-rack (ToR) nodes where there is a single point of failure for the VMs,
disk I/O operations are expected to occur in 20 seconds. If disk in not accessible in 20 seconds, unexpected and undefined
behavior of the VMs occurs. You can optimize the booting time of the ToR nodes that experience a single point of failure to
reduce the outage in traffic-handling operations.
RRoCE is bursty and uses the entire 10-Gigabit Ethernet interface. Although RRoCE and normal data traffic are propagated in
separate network portions, it may be necessary in certain topologies to combine both the RRoCE and the data traffic in a single
network structure. RRoCE traffic is marked with dot1p priorities 3 and 4 (code points 011 and 100, respectively) and these
queues are strict and lossless. DSCP code points are not tagged for RRoCE. Both ECN and PFC are enabled for RRoCE traffic.
Flex Hash and Optimized Boot-Up 329