Administrator Guide

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources
 
19     Deep Learning Performance Scale-Out 
 
Citation 
@article {sergeev2018horovod, 
 Author = {Alexander Sergeev and Mike Del Balso}, 
 Journal = {arXiv preprint arXiv: 1802.05799}, 
 Title = {Horovod: fast and easy distributed deep learning in {TensorFlow}}, 
 Year = {2018} 
} 
 
References 
 
•  [0] https://downloads.dell.com/manuals/all-
products/esuprt_solutions_int/esuprt_solutions_int_solutions_resources/servers-
solution-resources_white-papers52_en-us.pdf 
 
•  [1] Horovod GitHub, “Horovod Distributed Deep Learning Training Framework” [Online]. 
Available: https://github.com/horovod/horovod 
 
•  [2] Mellanox Community, “How to Create a Docker Container with RDMA Accelerated 
Applications Over 100Gb InfiniBand Network” [Online]. Available: 
https://community.mellanox.com/docs/DOC-2971  
 
•  [3] TensorFlow, “XLA: Optimizing Compiler for Machine Learning” [Online] Available: 
https://www.tensorflow.org/xla  
 
•  [4] NCCL 2.5, “NCCL Environment Variables” [Online]. Available: 
https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/env.html#nccl-ib-
cuda-support  
 
•  [5] TensorFlow, “Pushing the limits of GPU performance with XLA” [Online]. Available: 
https://medium.com/tensorflow/pushing-the-limits-of-gpu-performance-with-xla-
53559db8e473