Administrator Guide

Introduction
5 Prefailure alerts provided by Dell EMC PowerEdge server systems management | ID 426
1 Introduction
The acronym PFA stands for prefailure alert or predictive failure analysis. Originally, PFAs focused on
hard drives. The goal was, and still is, to avoid unplanned downtime. Over the years, PFAs have grown
beyond hard drives, and now include many other components in the server. This expanded coverage has
become increasingly important with the rise of virtualization. Today, there can be multiple virtual servers
depending on the underlying physical hardware. This dependency makes taking care of the server more
important than ever.
Not all failures are predictable, but the characteristics that are related to gradual mechanical wear and tear
can be tracked and monitored. At a certain point, it becomes likely for a failure to occur and iDRAC triggers a
warning alert. These alerts and parameters do not address every possible failure mode, but many customers
find them useful.
By the end of the 1990s, the various analysis and alerting technologies began to come together under the
Self-Monitoring Analysis and Reporting Technology (SMART) standard. Predictive Failure Analysis covers an
entire category of predicting impending failures of various components such as drives, memory, and
processors. PFA can also mean prefailure alert and predictive failure alert. This paper describes how IT
administrators can use information iDRAC provides to best meet the needs of an organization.
1.1 The integrated Dell Remote Access Controller (iDRAC)
The Integrated Dell Remote Access Controller (iDRAC) is designed to make server administrators more
productive and improve the overall availability of Dell EMC PowerEdge servers. The iDRAC sends alerts,
helps perform remote server management, and reduces the need for physical access to a server.
The iDRAC is part of a larger data center solution that helps keep business-critical applications and workloads
available. The technology allows IT administrators to deploy, monitor, manage, configure, update,
troubleshoot, and remediate Dell servers from any location, and without the use of software agents. Because
the iDRAC is embedded in the server, it can accomplish the above tasks regardless of operating system or
hypervisor presence or state.
The iDRAC processor polls each subsystem approximately every five seconds using advanced heuristic
algorithms. The iDRAC determines component performance and internal fault conditions that might lead to
unscheduled downtime and provides local and remote warnings to IT staff and consoles. By monitoring alerts
from the iDRAC, IT administrators can benefit from higher server availability and reduced total cost of
ownership.
1.2 iDRAC monitoring and alerting
The iDRAC monitors and alerts the following PowerEdge server subsystems:
Hard drives and SSDs
CPU
System memory
System temperature
Fans
Power Supplies
Monitoring and alerting topics include: