White Papers
Finding Zombie Servers inside a datacenter
10 Using OpenManage Enterprise Power Manager for Chargeback
3 Finding Zombie Servers inside a datacenter
There may be devices that keep consuming power irrespective of the amount of workload or payload that it
processes for computing. Such servers are referred to as zombies that heavily affect the power costs incurred
in a datacenter. To detect such servers, compare CPU, I/O, memory utilization (CUPs) against power
utilization. If even for minimal CUPs utilization, power consumption does not drop, it signifies that the server
has some component or an issue that keeps drawing excess power. After debugging the issue, replace the
server or server parts for efficiency.
To debug the issue, observe the metric values for a large duration. For example, consider 12 months of data,
the trend of power consumption to CUPs utilization is directly proportional, because as the payload to a server
increases, the CPU has to increase its clock speed resulting in higher rotation per unit time, which in turn
heats up the CPU and it needs more cooling thereby increasing power consumption drawn by CPU fans. The
same applies to the Memory and I/O devices too, when the heat increases, excess power is consumed to
increase cooling to keep the component and server safe.
After exporting data for a large duration (12 months of data), when you compare the trend in an excel file then
for minimal CUP utilization, the power consumption values must reduce for a regular server, otherwise the
server is mostly a zombie server. And, if such behavior appears consistently, probabilities are high that this is
a zombie server.
To check this at a device level, view the metrics graphs present in the Power Management and Monitoring
section for the device.
For example, the below graph depicts the trend for a zombie server. In this example, the power consumption
keeps increasing even when there was no load on the CPU. Although for an instance this could be pretty
normal and there might be multiple factors contributing to such a behavior. But for a long duration such as 9
months or 1 year, if the trend continues, it means that the power consumption is regularly high without any
context switches on the CPU, and similarly no utilization on RAM either.
Power consumption versus CPU utilization trend for a zombie server
After detecting a zombie server, you can debug and find out the reason for excess power consumption. This
could be due to faulty components such as PSU, or server fans that keep cooling down the server even when
cooling is not needed. This scan helps you find out such defective parts or devices and after fixing or
replacing them, reduce the power consumption.
0
100
200
300
400
500
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
Power Consumption vs CPU utilization
CPU utilisation Power Consumption