Datasheet

4
|
CHAPTER 1 What Kind of Protection do You need?
diminishing percentage of why servers failed. The growing majority of server outages were due to
software — meaning not only the software-based hardware drivers, but also the applications and
the OS itself. It is because of the shift in why servers were failing that data protection and availabil-
ity had to evolve.
So, let’s start by looking at what we can do to protect those hardware elements that can
cause a server failure or data loss. In such cases, when a tier-one server vendor is respected in
the datacenter space, I tend to dismiss the server hardware at first glance as the likely point of
failure. So, storage is where we should look first.
In t r o d u c I n g rAId
No book on data protection would be complete in its first discussions on disk without summariz-
ing what RAID is. Depending on when you first heard of RAID, it has been both:
Redundant Array of
•u Inexpensive Disks
Redundant Array of
•u Independent Disks
In Chapter 3, we will take an in-depth look at storage resiliency, including RAID models, but
for now, the key idea is that statistically, the most common physical component of a computer
to fail is a hard drive. Because of this, the concept of strapping multiple disks together in vari-
ous ways (with the assumption that multiple hard drives will not all likely break at once) is now
standard practice. RAID comes in multiple configurations, depending on how the redundancy is
achieved or the disks are aligned:
Mirroring — RAID 1 The first thing we can do is to remove the single spindle (another
term for a single physical disk, referring to the axis that all the physical platters within the
disk spin on). In its simplest resolution, we mirror one disk or spindle with another. With
this, the disk blocks are paired up so that when disk block number 234 is being written to the
first disk, block number 234 on the second disk is receiving the same instruction at the same
time. This completely removes a single spindle from being the single point of failure (SPOF),
but it does so by consuming twice as much disk (which equates to at least twice the costs)
power, cooling, and space within the server.
RAID 5, 1+0/10, and Others Chapter 3 will take us through all of the various RAID lev-
els and their pros and cons, but, for now, the chief takeaway is that you are still solving a
spindle-level failure. The difference between straight mirroring (RAID 1) and all other RAID
variants is that you are not in a 1:1 ratio of production disk and redundant disk. Instead, in
classic RAID 5, you might be spanning four disks where, for every N-1 (3 in this case) blocks
being written, three of the disks get data and the fourth disk calculates parity for the other
three. If any single spindle fails, the other three have the ability to reconstitute what was on
the fourth, both in production on the fly (though performance is degraded) and in reconsti-
tuting a new fourth disk.
But it is all within the same array, storage cabinet, or shelf for the same server. What if your
fancy RAID 5 disk array cabinet fails, due to two disks failing in a short timeframe, or the
power failing, or whatever?
In principle, mirroring (also known as RAID-1) and most of the other RAID topologies are
all attempts to keep a single hard drive failure from affecting the production server. Whether
the strategy is applied at the hardware layer or within the OS, the result is that two or more disk
drives act together to improve performance and/or mitigate outages. In large enterprises,
572146c01.indd 4 6/23/10 5:42:19 PM