System information

ManualsBrandsAPC ManualsNetwork CardRAID Subsystem SCSI-SATA II

181

182

183

184

185

186

187

188

189

190

Appendix

B-5

B.8 Proactive Data Protection

The most fundamental requirement for a storage system is to protect the data from all kinds of failures. The RAID controller

firmware supports versatile RAID configurations for different levels of reliability requirement, including RAID 6 to tolerate

double-drive failure, and Triple Parity for extreme data availability. It provides online utilities for proactive data protection to

monitor disk health, minimize the risk of data loss, and avoid RAID degradation. RAID configurations can be recovered and

imported even the RAID is corrupted.

• Online disk scrubbing

Bad sectors of a hard disk can be detected only when they are accessed, so bad sectors may stay a long time undetected if

disk access pattern is unevenly distributed and the sectors reside on seldom-accessed areas. In disk rebuilding, all data on

the surviving hard disks is needed to regenerate the data of the failed disk, and if there are bad sectors on the surviving disks,

the data cannot be regenerated and gone forever. As the number of sectors per disk increases, this will be a very common

issue to any disk-based storage systems. The firmware provides online disk scrubbing utility to test the entire disk surface by

a background task and recover any bad sectors detected.

• Online parity consistency check and recovery

The ability to protect data in parity-based RAID relies on the correctness of parity information. There are certain conditions

that the parity consistency might be corrupted, such as internal errors of hard drives or abnormal power-off of system while

the cache of hard drives is enabled. To ensure higher data reliability, the administrator can instruct the controller to conduct

parity check and recovery during disk scrubbing.

• S.M.A.R.T. drive health monitoring and self-test

S.M.A.R.T. stands for Self-Monitoring Analysis Reporting Technology, by which a hard disk can continuously self-monitor its

key components and collect statistics as indicators of its health conditions. The hard disks are periodically polled, and the

controller will alert the administrator and start disk cloning when the disks report warnings. The firmware can also instruct the

disk drives to execute device self-test routines embedded in the disk drives; this effectively helps the users to identify

defective disk drives.

• Online bad sector reallocation and recovery with over-threshold alert

Hard disks are likely to have more and more bad sectors after they are in service. When host computers access bad sectors,

the controller rebuilds data and responds to host. In addition to leveraging on-disk reserved space for bad block reallocation,

the controller uses the reserved space on hard disks for reallocating data of bad sectors. If the number of bad sectors

increases over the threshold specified by the administrator, alerts will be sent to the administrator, and disk cloning will be

started automatically.

• Online SMART disk cloning

When a hard disk fails in a disk group, RAID enters the degradation state, which means lower performance, higher risk of

data loss or RAID corruption. When a hard disk is likely to become faulty or unhealthy, such as bad sectors of a physical disk

increases over a threshold, or a disk reports SMART warning, the controller will online copy all data of the disk to a spare

disk. Moreover, should the source disk fails during the cloning, controller will start rebuilding on the cloning disk, and the

rebuilding will skip the sectors where the cloning has been done. The disk cloning has been approved as the most effective

solutions to prevent RAID degradation.

• Transaction log and auto parity recovery

The capability to rebuild data of parity-based data protection relies on the consistency of parity and data. However, the

consistency might not be retained because of improper system shutdown when there are uncompleted write commands. To

maintain the consistency, the controller keeps logs of write commands in the NVRAM, and when the controller is restarted,

the parity affected by the uncompleted writes will be automatically recovered.

• Battery backup protection

The controller delays the writes to disk drives and caches the data in the memory for performance optimization, but this also

causes risk because the data in the cache will be gone forever if the system is not properly powered off. The battery backup

module retains the data in the cache memory during abnormal power loss, and when the system is restarted, the data in the

cache memory will be flushed to the disk drives. As the size of cache memory installed grows increasingly, the data loss could

lead to unrecoverable disasters for applications.