Administrator Guide

Graceful Restart

Graceful restart (also known as non-stop forwarding) is a protocol-based mechanism that preserves the forwarding table of the restarting

router and its neighbors for a specied period to minimize the loss of packets.

A graceful-restart router does not immediately assume that a neighbor is permanently down and so does not trigger a topology change.

Dell Networking OS supports graceful restart for the following protocols:

• Border gateway protocol

• Open shortest path rst

• Protocol independent multicast — sparse mode

• Intermediate system to intermediate system

Software Resiliency

During normal operations, Dell Networking OS monitors the health of both hardware and software components in the background to

identify potential failures, even before these failures manifest.

System Health Monitoring

Dell Networking OS also monitors the overall health of the system.

Key parameters such as CPU utilization, free memory, and error counters (for example, CRC failures and packet loss) are measured, and

after exceeding a threshold can be used to initiate recovery mechanism.

Failure and Event Logging

Dell Networking systems provide multiple options for logging failures and events.

Trace Log

Developers interlace messages with software code to track the execution of a program.

These messages are called trace messages and are primarily used for debugging and to provide lower-level information then event

messages, which system administrators primarily use. Dell Networking OS retains executed trace messages for hardware and software and

stores them in les (logs) on the internal ash.

• Trace Log — contains trace messages related to software and hardware events, state, and errors. Trace Logs are stored in internal

ash under the directory TRACE_LOG_DIR.

• Crash Log — contains trace messages related to IPC and IRC timeouts and task crashes on line cards and is stored under the directory

CRASH_LOG_DIR.

Core Dumps

A core dump is the contents of RAM a program uses at the time of a software exception and is used to identify the cause of the exception.

There are two types of core dumps: application and kernel.

• Kernel core dump is the central component of an operating system that manages system processors and memory allocation and makes

these facilities available to applications. A kernel core dump is the contents of the memory in use by the kernel at the time of an

exception.

362

High Availability (HA)