Deployment Guide

Pre-deployment requirements and introduction to new features
17 VMware vSphere Bitfusion on Dell EMC PowerEdge servers | Deployment guide
5.6 Client virtual machine
The client cluster has a virtual machine with CentOS installed with the required NVIDIA tools and drivers.
You can use this virtual machine to access the GPUs remotely.
Install the following components on the CentOS virtual machine to set up the client cluster:
Python 3 and pip3 package manager
Compute Unified Device Architecture 10.0 Toolkit (CUDA) for Red Hat Enterprise Linux 7
cuDNN 7 python library
TensorFlow v1.13.1 GPU framework
TensorFlow benchmark toolkit compatible with TensorFlow v1.13 framework
This client virtual machine is connected to the management and RDMA network through PVRDMA.
Note: For instructions to deploy the above pre-requisites on a client virtual machine, see the Running
TensorFlow on vSphere Bitfusion vSphere Bitfusion guide.
5.7 Connectivity
The Bitfusion server appliances, client virtual machine with the remote GPUs and vCenter are connected over
a dedicated management network. In addition to this, vSAN, vMotion and Hardware Acceleration
communication are all required to be connected to the client cluster.
To monitor the GPU traffic, a dedicated RDMA (RoCE) connection is established between the GPU hosts and
the client cluster hosts.
The Dell EMC PowerSwitch ToR is configured for VLANs to accommodate vSAN, vMotion and GPU data
traffic management. Two switches are set up with Virtual Link Trucking (VLT) for redundancy.
Route the Bitfusion Appliance management network subnet to access the internet and then download the
NVIDIA driver.
5.8 Network services
Domain Name Service (DNS) is required to fetch both forward and reverse name resolution. The IP
addresses of name servers, search domains, and hostnames of all the Bitfusion appliance virtual machines
should be tested and verified for both forward and reverse lookups. Test the DNS entries using their Fully
Qualified Domain Name (FQDN) and their short name or hostname.
Time synchronization is critical to the Bitfusion server appliances. All the GPU hosts, client clusters and
Bitfusion appliance virtual machines are synchronized to a reference time source. Network Time Protocol
(NTP) traffic is routed from client to source or it can travel over the same L2 network.