Administrator Guide
5
Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries
Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries
Efficiency
Performance efficiency is the throughput
achieved, normalized by power consumed.
As we saw in Fig 4, doubling the number of
threads from 32 (i.e., SS) to 64 (i.e., DS)
does not have as much impact as going
from HS to SS and results in throughput
increase by only 11%. The price of this
rather sublinear performance increase is
reduced performance efficiency: by a factor
of 0.4 (Fig 7). The measured system power
of the CPU execution reached a maximum
416 W. With a measured board power of
50W on the PAC, the FPGA achieves a
much higher performance efficiency - a
factor of 6x - compared to the CPU, with
only 20% increase to the server’ power
budget.
Scale-up
We scaled up the number of PACs from 1 to 4. As shown in Fig 8, the FPGAs achieve a linear
speedup of 1,251 FPS with a consistent Perf/Watt. Although increasing the thread count beyond
64 would cause little bump in the FPS (as we saw in Fig 4) we were curious to see the CPU
performance at the same scale factor as the FPGA. In this scenario, we migrated our experiment
from the dual-socket PowerEdge R740 server to a quad-socket (QS) PowerEdge R840 server
with 4 CPUs. The QS configuration achieved a mere 330 FPS at 60% reduction in Perf/Watt
efficiency. The aggregate system power for the QS and 4xFPGA (with SS) configurations were
615W and 405W respectively, indicating the FPGAs yield best performance both in terms of
Perf/Watt and overall system-power budget.
Figure 7: Performance efficiency
Figure 8: The FPGAs scale linearly with a consistent Perf/Watt.
*In QS, batch size was increased to 96 to commensurate with the
increased core count.