Administrator Guide

4 RAPIDS Scaling on Dell EMC PowerEdge Servers

Executive Summary

Traditional Machine learning workflows often involve iterative and lengthy steps in data preparation,

model training, validating results and tuning models before the final solution can be deployed for

production. This cycle can consume a lot of resources, negatively impacting the productivity of the

developer’s team toward business transformation. In order to accelerate this, NVIDIA released the

Accelerated Data Science pipeline with RAPIDS. It’s a complete ecosystem that integrates multiple

Python libraries with CUDA at the core level and built on CUDA-X AI libraries and other key open-source

projects including Apache Arrow. This ecosystem provides GPU-accelerated software for data science

workflows that maximizes productivity, performance and ROI, at the lowest infrastructure total cost of

ownership (TCO).

In this paper we tested the NYC-Taxi sample notebook (included in the NGC RAPIDS container) and the

NYC Taxi dataset [1] (available from a public Google Cloud Storage bucket) on Dell EMC PowerEdge

servers C4140-M and R940xa with NVIDIA GPUs. We ran multiple tests to cover several configurations

such as single node and multi node as well as storing data on local disk vs using NFS (network file

system). We also investigated how NVIDIA’s implementation of RAPIDS memory manager helps in the

overall speedup. [2].

The main objective is to demonstrate how to speed up machine learning workflows with the RAPIDS

accelerated software stack, increasing performance in terms of productivity and accuracy at a lower

infrastructure cost.