Administrator Guide
RAPIDS Multi Node Set Up
30 RAPIDS Scaling on Dell EMC PowerEdge Servers
H RAPIDS Multi Node Set Up
1. Run as Docker container on each node
On each node, go inside the RAPIDS docker image and start the multi-node configuration as
described in the next steps. Below is the command example to go within the docker:
docker run --runtime=nvidia
--rm -it --net=host
-p 8888:8888
-p 8787:8787
-p 8786:8786
-v /home/rapids/notebooks-contrib/:/rapids/notebooks/contrib/
-v /home/rapids/data/:/home/dell/rapids/data/
nvcr.io/nvidia/rapidsai/rapidsai:0.10-cuda10.1-runtime-ubuntu18.04
2. Launch the dask-scheduler on the primary compute node
$ dask-scheduler --port=8888 --bokeh-port 8786
output:
distributed.scheduler - INFO - Receive client connection: Client-9ad22140-
83bd-11e9-823c-246e96b3e316
distributed.core - INFO - Starting established connection
3. Launch dask-cuda-worker on the primary compute node
This step will start workers at the same Primary machine as the scheduler was started
$ dask-cuda-worker tcp://<ip_primary_node>:8888
output:..... messages with successful connection
4. Launch dask-cuda-worker on the secondary compute node
This step will start additional workers on the secondary compute node
$ dask-cuda-worker tcp://<ip_primary_node>:8888
output:.. messages with successful connection
5. Start Jupyter and run the notebook (client python API) on the primary compute node
In this case, the NYC-Taxi notebook is the Client Python API which will be attached to the scheduler
running on the primary compute node, so it can be run using all compute node GPUs in distributed
mode. To do so, we need to modify the notebook, starting the client and providing the primary node
IP and port designated to be listened as below:
client = Client('tcp://<ip_primary_node>:8888') #connect to cluster
output:
Client
Scheduler: tcp://<ip_primary_node>:8888
Dashboard: http://<ip_primary_node>:8786/status
Cluster
Workers: 8 # total workers in distributed mode
Cores: 8
Memory: 67.47 GB