Users Guide

best path to each destination and delays installation of additional ECMP paths until a minimum of 30 seconds
has elapsed from the time the first BGP peer is established. Once this time has elapsed, all routes in the BGP
RIB are processed for additional paths.
While the above change will ensure that at least one path to each destination gets into the FIB as quickly as
possible, it does prevent additional paths from being used even if they are available. This downside has been
deemed to be acceptable.
RDMA Over Converged Ethernet (RoCE)
Overview
This functionality is supported on the platform.
RDMA is a technology that a virtual machine (VM) uses to directly transfer information to the memory of
another VM, thus enabling VMs to be connected to storage networks. With RoCE, RDMA enables data to be
forwarded without passing through the CPU and the main memory path of TCP/IP. In a deployment that
contains both the RoCE network and the normal IP network on two different networks, RRoCE combines the
RoCE and the IP networks and sends the RoCE frames over the IP network. This method of transmission,
called RRoCE, results in the encapsulation of RoCE packets to IP packets. RRoCE sends Infini Band (IB)
packets over IP. IB supports input and output connectivity for the internet infrastructure. Infini Band enables
the expansion of network topologies over large geographical boundaries and the creation of next-generation
I/O interconnect standards in servers.
When a storage area network (SAN) is connected over an IP network, the following conditions must be
satisfied:
Faster Connectivity: QoS for RRoCE enables faster and lossless nature of disk input and output services.
Lossless connectivity: VMs require the connectivity to the storage network to be lossless always. When a
planned upgrade of the network nodes happens, especially with top-of-rack (ToR) nodes where there is
a single point of failure for the VMs, disk I/O operations are expected to occur in 20 seconds. If disk in
not accessible in 20 seconds, unexpected and undefined behavior of the VMs occurs. You can optimize
the booting time of the ToR nodes that experience a single point of failure to reduce the outage in
traffic-handling operations.
RRoCE is bursty and uses the entire 10-Gigabit Ethernet interface. Although RRoCE and normal data traffic
are propagated in separate network portions, it may be necessary in certain topologies to combine both the
RRoCE and the data traffic in a single network structure. RRoCE traffic is marked with dot1p priorities 3 and 4
(code points 011 and 100, respectively) and these queues are strict and lossless. DSCP code points are not
tagged for RRoCE. Both ECN and PFC are enabled for RRoCE traffic. For normal IP or data traffic that is not
RRoCE-enabled, the packets comprise TCP and UDP packets and they can be marked with DSCP code
points. Multicast is not supported in that network.
RRoCE packets are received and transmitted on specific interfaces called lite-subinterfaces. These interfaces
are similar to the normal Layer 3 physical interfaces except for the extra provisioning that they offer to enable
the VLAN ID for encapsulation.
Flex Hash and Optimized Boot-Up 361