White Papers
Ready Specs
2
Table 2. Software details
Software
OS
RHEL 7.3 x86_64
Linux Kernel
3.10.0-514.el7.x86_64
BIOS
1.2.11
Intel Caffe
1.0.4
Intel MLSL (for multi-node tests)
2017.1.016
Caffe Model
intel_optimized/multinode/resnet_50_256_nodes_8k_batch (with
batch size modified)
Performance tests were conducted on three generations of servers supporting different Intel CPU technology. The system configuration
of these test beds is shown in Table 1 and the software configuration is listed in Table 2. This space is new and rapidly evolving, with
frameworks being continuously updated and optimized. We expect performance to continue to improve, with subsequent releases, as
such the results are intended to provide insights and not be taken as absolute.
As shown in Table 2 we used the Intel Caffe optimized multi-node version for all tests. There are differences between Intel’s
implementation of the single-node and multi-node Caffe models, and using the multi-node model across all configurations allows for an
accurate comparison between single and multi-node scaling results. Unless otherwise stated all tests were run using the compressed
ILSVRC 2012 (Imagenet) database which contains 1,281,167 images. The dataset is loaded into /dev/shm before the start of the test.
For each data point a parameter sweep was performed across three parameters: batch size, prefetch size, and thread count. Batch size
is the number of training examples fed into the model at one time, prefetch is the number of batches (of batch size) buffered in memory,
and thread count is the number of threads used per node. The results shown used the best results from the parameter sweep for each
test case. The metric used for comparison is images per second, which is calculated by taking the total number of images the model
has seen (batch_size * iterations * nodes) divided by the total training time. Training time does not include Caffe startup time.
Single Node Performance
To determine what processors might be best suited for these workloads we tested a variety of SKUs including Intel Xeon E5-2697 v4
(Broadwell – BDW); Silver, Gold and Platinum Intel Xeon Scalable Processor Family CPUs (Skylake - SKL), as well as an Intel Xeon
Phi CPU (KNL). The single node results are plotted in Figure 1 with the line graph showing results relative to the performance of the
E5–2697 v4 BDW system.