White Papers

Abaqus Performance

18 Dell EMC Ready Solution for HPC Digital Manufacturing—Dassault Systѐmes’ Simulia Abaqus Performance

For each benchmark, the wall clock time (in sec) is shown. For the GPU enabled runs, all 40 Xeon cores were

used. Benchmarks were carried out using the base system with no GPU acceleration, and using 1,2,4 GPUs.

With Abaqus, the number of GPUs for each of the MPI domain must be the same, so one a single server, it is

possible to using the GPUs in a variety of ways. As an example, with four GPUs, the benchmarks can be

made with a single MPI domain, using 0,1,2,4 GPUs. With two MPI domains, each domain could use 0,1,2

GPUs, and with four MPI domains, each domain could use 0,1 GPUs. Only certain code sections have been

enabled to take advantage of GPUs, and as shown in figure 5 above, Abaqus typically runs better having

more domains per node. As a result, when MPI mode is possible (S2a, S4b, S4d, S6), the best results are

obtained when running with each GPU in its own MPI domain, allowing for more domains per node. With the

modal analysis D3d case, all GPUs were used in the single domain. For these benchmarks, GPUs did not

improve performance over the existing Xeon processors. It may be the case that for certain simulations,

GPUs may offer significant performance advantages, so care should be taken when choosing systems with

GPUs. to determine whether utilizing GPUs would be appropriate for their simulations.

Figure 10 demonstrates the performance of a Window’s based R840 Basic Building Block on the larger

benchmark models.

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

1050

1100

1150

S2A S3D S4B S4D S6

Solver Elapsed Time (sec)

Figure 9: Abaqus GPU Performance

GPU-0 GPU-1 GPU-2 GPU-4