Family paper
Appendix C: Glossary of Useful RAS Terms
RAS
Reliability, Availability and Serviceability
Reliability
Assurance that computational results are correct. Errors are detected and corrected when possible and reported if they cannot be
corrected.
Availability
Assurance that the system is up and running to support an organization’s computing needs.
Serviceability
Assurance that errors are reported and that faulty components can be identied and replaced.
Error Detection/Correction
Ability to detect and correct hard and soft errors to increase reliability and availability.
Soft Errors
A transient error that can be corrected by overwriting with the correct data.
Hard Errors
A persistent error that cannot be xed by overwriting with the correct data (e.g., a faulty logic gate).
Error Correction Code (ECC),
Cyclic Redundancy Check (CRC), Parity
Widely used mechanisms for identifying hard and soft errors.
Domain Partitioning
Ability to divide a system into a number of smaller systems, each booting its own OS and operating independently of the others.
Physical (Hard)
Each partition is completely isolated, so software and hardware errors in one partition will not impact another.
Dynamic
Partitioning can be implemented and partition boundaries modied without shutting down the system or rebooting affected operating
systems.
Static
Partitioning can be implemented and congured only at boot time or if impacted operating systems are not running.
Field Replaceable Unit (FRU)
A system component that can be physically added, removed or replaced in the eld, such as a processor board, an I/O Hub, a DIMM or a PCI
card. An FRU may or may not be hot pluggable.
Hot Plug
A generic term for hot add and hot remove operations.
Hot Add/Remove
The physical addition/removal of an FRU without shutting down the system or stopping the OS. Typically requires OS support.
Hot Replace
A hot remove followed by a hot add, typically prompted by components that are defective or starting to show signs of defects (e.g. error
frequency exceeds a congurable threshold).
Hot Swap
A logical replacement of a component with another component that has already been installed in the system (e.g., a spare processor,
DIMM or I/O Hub). A hot swap can be OS transparent (accomplished completely via hardware and rmware) or OS assisted.
Onlining/Ofining
The logical (non-physical) addition/removal of a component in a running system. An ofined component can remain in the system as a
spare, be reallocated to another partition, or removed at a later time.
Processor:
Socket, Core, Logical Processor
Each Intel® Itanium® processor ts into a single socket on the system motherboard and can contain up to four cores (complete execution
units). Each core can be recognized as multiple logical processors by the OS. A logical processor is collection of processing resources
capable of running a thread of software code.
15
White Paper: The Intel® Itanium® Processor 9300 Series