White Papers

When designing the optimal platform to use for a neural network, how that particular neural network is constructed is

crucial in determining what options are best for it at other layers of the stack. In general, the platform designer’s

goal is to understand how data is moved in, out, and around inside of the system to tune features in a manner

that most efficiently eliminates data choke points or bottlenecks.

For example, small neural networks that can be computed relatively quickly might create a tremendous demand on

data set ingest bandwidth either from local storage or remote data pools and consequently would be potentially

bottlenecked by slow storage devices or narrow I/O bandwidth. Pairing this type of neural model with a high

performance accelerator platform that lacks significant I/O bandwidth would also result in under-utilized compute

hardware.

As another example, very large neural networks with a large number of input features and/or activation layers, may not

fit comfortably inside of a single accelerator’s onboard memory or need to swap weight calculations in and out of the

page file during each iteration. This type of model might operate most efficiently when the stored weights can be

exchanged and multiplied across multiple accelerators. So, a hardware platform that offers multiple accelerators would

be the right choice in this case. But note that the distribution of operations to multiple accelerators is handled

differently by different hardware offerings and frameworks, so the efficiency of distribution varies accordingly. Also note

that not every neural network benefits equally from multiple accelerators – or at least not at the same scaling

efficiency. (See the following sections.)

Framework layer

Neural network models run on deep learning software frameworks. The proliferation of frameworks, while primarily

open source in nature, has largely stemmed from academia and a number of hyperscale service providers – each

attempting to advance their own particular code. You can run virtually any neural network on any deep learning

framework, but they are certainly not all created equal. The manner in which frameworks utilize the underpinning

hardware varies from framework to framework. While end users often choose a framework based on coding familiarity,

there are a number of factors to consider that impact neural network performance:

 How a framework makes math library calls (and which libraries it uses), how it pulls apart the tensor

multiplication operations, and how it maps these operations into the physical hardware are all unique to that

framework.

 Some frameworks are better at scaling outside of a single server to use multiple servers working together - and

some are not capable of scaling out at all.

 Some frameworks are well suited to orchestrating neural network mathematics across a large number of

parallel compute devices (i.e. GPUs) within a single server, while others scale very poorly on multiple

accelerators.

Each of these points needs to be considered in light of the characteristics of the specific neural network. They may

ultimately influence the choice of framework and the accelerator options.