HP Compilers for HP Integrity Servers (September 2011)

The loop optimizer also performs some new optimizations:
Automatic parallelization. This optimization allows applications to exploit otherwise
idle resources on multicore or multiprocessor systems by automatically transforming
serial loops into multithreaded parallel code. When the +Oautopar option is used
at optimization levels three (+O3) and above, the compiler automatically parallelizes
those loops that are deemed safe and profitable by the loop transformer. With
+Oautopar, the parallelized application will utilize all the processors or the number
of desired processors indicated by the environment variable OMP_NUM_THREADS.
The default is +Onoautopar, which disables automatic parallelization of loops.
Automatic parallelization can be combined with manual parallelization through the
use of OpenMP directives and the +Oopenmp option. When both +Oopenmp and
+Oautopar are specified, then any existing OpenMP directives take precedence,
and the compiler will only consider auto-parallelizing other loops that are not
controlled by those directives.
Figure 2 Build model for interprocedural optimization
Loop multiversioning. Some loops can be optimized more aggressively by assuming
certain conditions, all of which may not be known at compile time. The loop optimizer
can clone these loops, introduce some runtime checks and optimize the cloned loops
more aggressively. At the executable runtime, the assumed conditions are checked
and the correct loop is executed.
malloc combining. The optimizer can combine several small block allocations into
a single large block allocation. This improves locality and reduces the cost of calling
the allocation routine.
Understanding key features of the HP compilers 19