White Papers
Increased Productivity across AI Development, Training and Deployment
IT regains the ability to assign GPU resources based on organization business priorities, and remotely pool together
resources, while attaching them in real-time to the workloads, with known schedule and utilization plan. For example, GPU
resources from Department A which completed intensive training and development schedule, can be reassigned to
Department B which now experiences peak demand for GPUs for an urgent AI project.
Figure 2: Dell EMC PowerEdge with FlexDirect Elastic AI Infrastructure Reference Architecture
Dell EMC and Bitfusion designed and validated the reference configuration shown in Figure 2 to help deployment of Elastic
AI Infrastructure in customer datacenters. Integration of Bitfusion FlexDirect doesn’t necessitate any changes to OS,
drivers or AI frameworks. The intent of the tests were to prove the AI developer experience is the same, as if the GPUs
are attached locally to the servers where the workloads are being executed, compared with executing the CUDA calls in
a remote GPU (or GPUs). Standard AI benchmarks were used, with a variety of frameworks, models, batch sizes and
network configurations namely 10G TCP, 10G RoCE, 40G RoCE and Infiniband EDR to simulate a range of customer
environments. The results for AI model training using Tensorflow in a 40GbE RoCE environment are shown in Figure 3
and 4, respectively, on the next page.
Figure 3 shows the measurement of the performance for remote attach of GPUs (on PE-C4140) over the network
compared against running the same workload locally on the GPU system. Figure 4 shows the performance of fractional
GPUs (that can be shared) and shows how the aggregate performance is similar to using a full physical GPU. Across
models, batch sizes and tests, Dell EMC PowerEdge with Bitfusion FlexDirect demonstrated that network attached full
and fractional GPUs accomplish near native performance across the suite of benchmarks. Please contact Dell or Bitfusion
to get the additional details regarding the performance benchmarking shared in this brief.
© 2018 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries