Reference Guide
Altair AcuSolve Performance
16 Dell EMC Ready Solutions for HPC Digital Manufacturing with AMD EPYC™ Processors—Altair Performance
Again, the Riser(R) model shows initial better overall parallel scaling than the larger Windmill(W) and
Nozzle(N) models, primarily from cache effects. All models display similar behavior when the number of
shared memory threads is varied. There is little benefit in using multiple threads until four nodes are used. At
eight nodes, the benefits of multiple shared memory threads are noticeable for the smaller Riser(R) and
Windmill(W), where typically the more threads the better, up to a certain point. The larger Nozzle(N) model
shows excellent parallel speedup over the range of thread per domain tested. It would appear that a good rule
of thumb for using thread parallelism would be to use one thread for the number of nodes used in the run (i.e.
4 threads for 4-node runs). This may not be optimal for every situation but should give reasonable
performance.
1.00
2.00
4.00
8.00
64(1) 128(2) 256(4) 512(8)
Performance relative to one node (64 cores)
Number of cores (number of nodes)
Figure 6: AcuSolve Hybrid Parallel Scaling
R-1 R-2 R-4 R-8
W-1 W-2 W-4 W-8
N-1 N-2
N-4
N-8