White Papers

When designing the optimal platform to use for a neural network, how that particular neural network is constructed is
crucial in determining what options are best for it at other layers of the stack. In general, the platform designer’s
goal is to understand how data is moved in, out, and around inside of the system to tune features in a manner
that most efficiently eliminates data choke points or bottlenecks.
For example, small neural networks that can be computed relatively quickly might create a tremendous demand on
data set ingest bandwidth either from local storage or remote data pools and consequently would be potentially
bottlenecked by slow storage devices or narrow I/O bandwidth. Pairing this type of neural model with a high
performance accelerator platform that lacks significant I/O bandwidth would also result in under-utilized compute
hardware.
As another example, very large neural networks with a large number of input features and/or activation layers, may not
fit comfortably inside of a single accelerator’s onboard memory or need to swap weight calculations in and out of the
page file during each iteration. This type of model might operate most efficiently when the stored weights can be
exchanged and multiplied across multiple accelerators. So, a hardware platform that offers multiple accelerators would
be the right choice in this case. But note that the distribution of operations to multiple accelerators is handled
differently by different hardware offerings and frameworks, so the efficiency of distribution varies accordingly. Also note
that not every neural network benefits equally from multiple accelerators or at least not at the same scaling
efficiency. (See the following sections.)
Framework layer
Neural network models run on deep learning software frameworks. The proliferation of frameworks, while primarily
open source in nature, has largely stemmed from academia and a number of hyperscale service providers each
attempting to advance their own particular code. You can run virtually any neural network on any deep learning
framework, but they are certainly not all created equal. The manner in which frameworks utilize the underpinning
hardware varies from framework to framework. While end users often choose a framework based on coding familiarity,
there are a number of factors to consider that impact neural network performance:
How a framework makes math library calls (and which libraries it uses), how it pulls apart the tensor
multiplication operations, and how it maps these operations into the physical hardware are all unique to that
framework.
Some frameworks are better at scaling outside of a single server to use multiple servers working together - and
some are not capable of scaling out at all.
Some frameworks are well suited to orchestrating neural network mathematics across a large number of
parallel compute devices (i.e. GPUs) within a single server, while others scale very poorly on multiple
accelerators.
Each of these points needs to be considered in light of the characteristics of the specific neural network. They may
ultimately influence the choice of framework and the accelerator options.
© 2018 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries