Administrator Guide

6 RAPIDS Scaling on Dell EMC PowerEdge Servers

Data Processing Evolution:

In a benchmark consisting of aggregating data, the CPU becomes the bottleneck because there is too much

data movement between the CPU and the GPU. So, RAPIDS is focused on the full data science workflow

and keeping data on the GPU (using same memory format as Apache Arrow). As you lower data movement

between CPU & GPU, it leads to faster data processing as shown in Figure 2.

Figure 2. Data Processing Evolution. Source: Nvidia

Pillars of Rapids Performance:

• CUDA Architecture: Massively parallel processing

• NVLink/NVSwitch: High speed connecting between GPUs for distributed algorithms

• Memory Architecture: Large virtual GPU memory, high speed memory

1.1 XGBoost

XGBoost is one of the most popular machine learning packages for training gradient boosted decision trees.

Native cuDF support allows to pass data directly to XGBoost while remaining in GPU memory. Its popularity

relies on its strong history of success on a wide range of problems and for being the winner of several

competitions, increasing the stakeholder confidence in its predictions. However, it has some known

limitations such as the tradeoff of scale out and accuracy and issues with considerable number of

hyperparameters can take long time to find the best solution. Figure 3 shows the average ranking of the

ML algorithms and the XGBoost is one of the leading algorithms.