White Papers

Abaqus Performance
18 Dell EMC Ready Solution for HPC Digital ManufacturingDassault Systѐmes’ Simulia Abaqus Performance
For each benchmark, the wall clock time (in sec) is shown. For the GPU enabled runs, all 40 Xeon cores were
used. Benchmarks were carried out using the base system with no GPU acceleration, and using 1,2,4 GPUs.
With Abaqus, the number of GPUs for each of the MPI domain must be the same, so one a single server, it is
possible to using the GPUs in a variety of ways. As an example, with four GPUs, the benchmarks can be
made with a single MPI domain, using 0,1,2,4 GPUs. With two MPI domains, each domain could use 0,1,2
GPUs, and with four MPI domains, each domain could use 0,1 GPUs. Only certain code sections have been
enabled to take advantage of GPUs, and as shown in figure 5 above, Abaqus typically runs better having
more domains per node. As a result, when MPI mode is possible (S2a, S4b, S4d, S6), the best results are
obtained when running with each GPU in its own MPI domain, allowing for more domains per node. With the
modal analysis D3d case, all GPUs were used in the single domain. For these benchmarks, GPUs did not
improve performance over the existing Xeon processors. It may be the case that for certain simulations,
GPUs may offer significant performance advantages, so care should be taken when choosing systems with
GPUs. to determine whether utilizing GPUs would be appropriate for their simulations.
Figure 10 demonstrates the performance of a Window’s based R840 Basic Building Block on the larger
benchmark models.
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
S2A S3D S4B S4D S6
Solver Elapsed Time (sec)
Figure 9: Abaqus GPU Performance
GPU-0 GPU-1 GPU-2 GPU-4