White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

1. System bandwidth performance i.e. PCIe connected to GPU - p2pbandwidth & latency

tests

2. GPU hardware performance without any Deep learning frameworks – Baidu Deep Bench

3. System running GPU & benchmarks – TensorFlow benchmarks

3.1 Criteria

1. In order to bound our testing, we picked TensorFlow as the framework of choice since it

has better support and models are readily available.

2. For distributed training, we selected Uber Horovod implementation, since it’s one of the

best performing distributed implementation [2].

3.2 Why TensorFlow as the framework of choice?

The reason we selected TensorFlow is because it’s the most widely used framework of choice for

machine learning and deep learning. It also has a wider support within open source community

and availability of pre-trained models. It also has better community support and supported very

well by the TensorFlow team.

TensorFlow is also widely used within the Dell EMC customer base and one of the top choices

when developing any new projects in machine learning. Figure 6 shows how TensorFlow

compares in terms of GitHub commits, stars and number of forks. This is a pretty good indicator

of its widespread adoption.