Hardware manual

15–1
15 Group monitoring
It is best practice to regularly monitor a PS Series group, so you can address issues before service is interrupted.
About monitoring best practices
Dell recommends that you set up event notification to inform you automatically of events and operations in a
group. See Event notification methods on page 1
4-2.
If you configured SNMP trap notification, you can examine the traps using an SNMP console. See About SNMP
traps on page 14-6.
Tab
le 15-1 describes general best practices for monitoring a group. The first column lists the
monitoring condition,
the second column describes it, and the third column provides a reference for more information about addressing
the issue.
T
able 15-1: General Monitoring Best Practices
Monitor Description Reference
Events of WARNING,
ERROR, and F
ATAL
severity
Investigate significant events and take the
appropriate action
to prevent or resolve problems.
See Monitoring events on page 15-2
Hardware failures Replace failed hardware promptly. Multiple hardware
failures m
ight result in an offline member or lost data.
Do not remove a failed hardware component until you are
ready to replace it.
See Monit
oring alarms and
operations on page 15
-10 and
Monito
ring group members on page
15
-15
Degraded RAID set If a RAID 1 or RAID 5 set is degraded, another disk drive
failure in
the RAID set might result in lost data. If a RAID 6
set is degraded, one or two additional drive failures might
result in lost data. Immediately replace failed drives.
See Monitoring a specific member
on page 15-16
Offline volumes An offline volume might indicate a
problem. For example, a
mem
ber might be offline.
See Monitoring volumes and
snapshots on pa
ge 15-25
Low pool space Do not let pool space fall below a recommended value. See Monit
oring storage pool free
space on page 15-15
Low free volume space If there is insuf
ficient free volume space, writes to
the
volume fail. This can affect application performance.
If a write to a volume exceeds the reported
size, it fails. This
can affect applications that use the volume.
If a thin-provisioned volume reaches its maximum in-use
space limit (and the limit
is less than 100%), the group sets
the volume offline.
See Monitoring volumes,
collections, and snapshots on pag
e
15
-24
Incomplete pool move
operations
Movi
ng a volume or a member from one pool to another can
take a long time. Monitor the progression of the move
operation and make sure that it completes.
See Monitoring group operations on
page 15-14