Reference Guide

26 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0

cached into the system memory on one compute node. The cache was not cleared after the benchmark

stopped, that is why the used memory is closed to zero but cache still does not decrease. Although the memory

cannot cache the whole dataset, the network and file system are fast enough to feed data to GPUs to keep

them stay at high utilization and to keep the training speed at 2,940 images/sec. The average GPU utilization

was around 380% across four GPUs. This high utilization indicates that other parts (CPU, memory, network,

etc.) of one compute node are not the performance bottleneck.

Figure 13: The CPU utilization, memory usage and GPU utilization on one compute node, and network

and disk throughput on Isilon when running Resnet50 with 10x ILSVRC2012 TFRecords dataset