Reference Guide
26 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
cached into the system memory on one compute node. The cache was not cleared after the benchmark
stopped, that is why the used memory is closed to zero but cache still does not decrease. Although the memory
cannot cache the whole dataset, the network and file system are fast enough to feed data to GPUs to keep
them stay at high utilization and to keep the training speed at 2,940 images/sec. The average GPU utilization
was around 380% across four GPUs. This high utilization indicates that other parts (CPU, memory, network,
etc.) of one compute node are not the performance bottleneck.
Figure 13: The CPU utilization, memory usage and GPU utilization on one compute node, and network
and disk throughput on Isilon when running Resnet50 with 10x ILSVRC2012 TFRecords dataset