Administrator Guide

19 Deep Learning Performance Scale-Out
Citation
@article {sergeev2018horovod,
Author = {Alexander Sergeev and Mike Del Balso},
Journal = {arXiv preprint arXiv: 1802.05799},
Title = {Horovod: fast and easy distributed deep learning in {TensorFlow}},
Year = {2018}
}
References
[0] https://downloads.dell.com/manuals/all-
products/esuprt_solutions_int/esuprt_solutions_int_solutions_resources/servers-
solution-resources_white-papers52_en-us.pdf
[1] Horovod GitHub, “Horovod Distributed Deep Learning Training Framework [Online].
Available: https://github.com/horovod/horovod
[2] Mellanox Community, “How to Create a Docker Container with RDMA Accelerated
Applications Over 100Gb InfiniBand Network” [Online]. Available:
https://community.mellanox.com/docs/DOC-2971
[3] TensorFlow, “XLA: Optimizing Compiler for Machine Learning” [Online] Available:
https://www.tensorflow.org/xla
[4] NCCL 2.5, “NCCL Environment Variables” [Online]. Available:
https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/env.html#nccl-ib-
cuda-support
[5] TensorFlow, “Pushing the limits of GPU performance with XLA” [Online]. Available:
https://medium.com/tensorflow/pushing-the-limits-of-gpu-performance-with-xla-
53559db8e473