Administrator Guide

6 RAPIDS Scaling on Dell EMC PowerEdge Servers
Data Processing Evolution:
In a benchmark consisting of aggregating data, the CPU becomes the bottleneck because there is too much
data movement between the CPU and the GPU. So, RAPIDS is focused on the full data science workflow
and keeping data on the GPU (using same memory format as Apache Arrow). As you lower data movement
between CPU & GPU, it leads to faster data processing as shown in Figure 2.
Figure 2. Data Processing Evolution. Source: Nvidia
Pillars of Rapids Performance:
CUDA Architecture: Massively parallel processing
NVLink/NVSwitch: High speed connecting between GPUs for distributed algorithms
Memory Architecture: Large virtual GPU memory, high speed memory
1.1 XGBoost
XGBoost is one of the most popular machine learning packages for training gradient boosted decision trees.
Native cuDF support allows to pass data directly to XGBoost while remaining in GPU memory. Its popularity
relies on its strong history of success on a wide range of problems and for being the winner of several
competitions, increasing the stakeholder confidence in its predictions. However, it has some known
limitations such as the tradeoff of scale out and accuracy and issues with considerable number of
hyperparameters can take long time to find the best solution. Figure 3 shows the average ranking of the
ML algorithms and the XGBoost is one of the leading algorithms.