Administrator Guide

5 RAPIDS Scaling on Dell EMC PowerEdge Servers
1 RAPIDS Overview
RAPIDS is a GPU accelerated data science pipeline, and it consists of open-source software libraries based
on python to accelerate the complete workflow from data ingestion and manipulation to machine learning
training. It does this by:
1. Adopting the columnar data structure called GPU data frame as the common data format across all
GPU-accelerated libraries.
2. Accelerating data science building blocks (such as data manipulation, routines, and machine learning
algorithms) by processing data and retaining the results in the GPU memory.
Figure 1 shows the main software libraries as part of RAPIDS:
cuDF: Is the GPU DataFrame library with Pandas-like API style for data cleaning and transformation.
It is a single repository containing both the low-level implementation and C/C++ API (LibGDF) and high-
level wrappers and APIs (PyGDF). It allows to convert Pandas DataFrame to GPU DataFrame (Pandas
↔ PyGDF)
cuML: Suite of libraries with the implementation of machine learning algorithms compatible with
RAPIDS ecosystem; including Clustering, Principal Components Analysis, Linear Regression, Logistic
Regression, XGBoost GBDT, XGBoost Random Forest, K-Nearest Neighbors (KNN), GLM (including
Logistic), Support Vector Machines, among others.
cuGraph: Library for Graph Analytics
Figure 1. RAPIDS open Source Software. Source: Nvidia