Applications paper

ManualsBrandsIntel ManualsOtherIntel Core2 Duo Processor T5500

Improve Uptime for Your Most Critical Applications

In addition to providing major performance gains, this new

processor family delivers substantial improvements in reliabil-

ity, availability, and serviceability (RAS) to support even higher

levels of data integrity and system uptime. Hardware-based error

prevention, detection, and correction are enhanced and ex-

tended throughout the platform. Improvements in rmware add

to these advantages, providing expanded coverage of potential

error events, along with improved logging for higher availability,

faster recovery, and better support for predictive failure. These

capabilities work in conjunction with Intel Itanium processors’

complete machine check architecture, which coordinates error

handling across hardware, rmware, and operating systems to

enable extremely high availability and data integrity.

Advanced Error Correction throughout the Platform

All silicon-based computer chips are vulnerable to ordinary

background radiation. An alpha particle can change the value of

data in a register or array. Electrical noise and variations in power

supplies can have similar impacts (although they rarely do). The

longer the data is held, the greater the chance that it will be

modied by one of these transient events, resulting in a “soft

error.” There are many possible design strategies for dealing with

soft errors. The best hardware designs automatically detect and

correct for common classes of soft errors to improve data integ-

rity and system availability without requiring rmware, operating

system (OS), or application intervention.

The Intel Itanium processor family incorporates extensive fea-

tures for automatically detecting and correcting soft errors at

the hardware level. For example:

• Errors in large caches and arrays are automatically detected and

corrected using error correcting code (ECC).

• Errors in smaller caches and various buffers and arrays are de-

tected using parity bits. These transient errors can then be

corrected using various forms of “trying again,” which simply

means returning to a state prior to the error event and then

proceeding as if the error had not occurred.

• Errors in pipelines are detected using residues, which are cal-

culated during mathematical operations, or using parity bits,

which move along with data and instructions in the pipeline.

When transient errors are detected in a pipeline, they can

also be corrected by trying again. The mechanisms are similar

to those used for correcting errors in smaller caches, buffers,

and arrays.

Next-Generation RAS with Intel® Instruction Replay

Technology

The next-generation Intel Itanium processor family provides

enhanced support for soft error detection and correction

throughout the platform. One of the most important new RAS

features is Intel Instruction Replay Technology. This technology

provides exceptionally fast recovery from soft errors in one

of the most performance-critical areas of the processor: the

instruction pipeline. In order to understand how Intel Instruc-

tion Replay functions, it is rst necessary to understand how

the pipeline itself works.

Understanding normal (error-free) pipeline execution

Intel Itanium processors have a memory hierarchy of caches,

buffers, and registers that hold the data waiting to be pro-

cessed and the program instructions waiting to be executed.

Software programs held in main memory are executed by

bringing the needed portions of the program into the proces-

sor’s caches. From there, the instructions are moved into

buffers and sent down pipelines to be executed. Data moves in

a similar fashion, from main memory, to caches, to buffers, and

nally to registers, at which point specic instructions act on

specic data.