Administrator Guide
4 RAPIDS Scaling on Dell EMC PowerEdge Servers
Executive Summary
Traditional Machine learning workflows often involve iterative and lengthy steps in data preparation,
model training, validating results and tuning models before the final solution can be deployed for
production. This cycle can consume a lot of resources, negatively impacting the productivity of the
developer’s team toward business transformation. In order to accelerate this, NVIDIA released the
Accelerated Data Science pipeline with RAPIDS. It’s a complete ecosystem that integrates multiple
Python libraries with CUDA at the core level and built on CUDA-X AI libraries and other key open-source
projects including Apache Arrow. This ecosystem provides GPU-accelerated software for data science
workflows that maximizes productivity, performance and ROI, at the lowest infrastructure total cost of
ownership (TCO).
In this paper we tested the NYC-Taxi sample notebook (included in the NGC RAPIDS container) and the
NYC Taxi dataset [1] (available from a public Google Cloud Storage bucket) on Dell EMC PowerEdge
servers C4140-M and R940xa with NVIDIA GPUs. We ran multiple tests to cover several configurations
such as single node and multi node as well as storing data on local disk vs using NFS (network file
system). We also investigated how NVIDIA’s implementation of RAPIDS memory manager helps in the
overall speedup. [2].
The main objective is to demonstrate how to speed up machine learning workflows with the RAPIDS
accelerated software stack, increasing performance in terms of productivity and accuracy at a lower
infrastructure cost.