HP aC++/HP C A.06.20 Release Notes
+O[no]autopar Option (New)
+O[no]autopar
This release adds support on the Itanium platform for a new optimization
-auto-parallelization - which is enabled by adding the +Oautoparoption to the
command-line. This optimization allows applications to exploit otherwise idle resources
on multicore or multiprocessor systems by automatically transforming serial loops into
multithreaded parallel code.
When the +Oautoparoption is used at optimization levels +O3and above, the compiler
will automatically parallelize those loops that are deemed safe and profitable by the
loop transformer.
The default is +Onoautopar, which disables automatic parallelization of loops.
Automatic parallelization can be combined with manual parallelization through the
use of OpenMP directives and the +Oopenmp option. When both +Oopenmp and
+Oautopar options are specified, the compiler honors the OpenMP directives first,
and then looks for loops that have not been parallelized manually with OpenMP
directives. For these loops, the compiler automatically parallelizes each loop that is
both safe and likely to have improved performance when executed in parallel.
Programs compiled with the +Oautopar option require the libcps, libomp, and
libpthreads runtime support libraries to be present at both compilation and runtime.
When linking with the HP-UX B.11.61 linker (patch PHSS_36342 or PHSS_36349),
compiling with the +Oautoparoption causes them to be automatically included. Older
linkers require those libraries to be specified explicitly or by compiling with +Oopenmp.
At present, +Oautoparis only supported when compiling C or Fortran files, and not
C++ files. If you use +Oautopar with C or Fortran code in a mixed-language application
that also includes C++ files, you must use -mt when compiling and linking the C++
files, similar to the current requirements for +Oopenmp. Please refer to the
documentation of the aCC compiler's-mt option for additional information and
restrictions.
+O[no]loop_block Option (New)
+O[no]loop_block
Loop blocking is a combination of strip mining and interchange that improves data
cache locality. It is provided primarily to deal with nested loops that manipulate arrays
that are too large to fit into the data cache. Under certain circumstances, loop blocking
allows reuse of these arrays by transforming the loops that manipulate them so that
they manipulate strips of the arrays that fit into the cache.
At optimization levels 3 and 4, using +Oloop_block (the default) allows automatic
loop blocking. Specifying +Onoloop_block disables loop blocking.
20 What’s New in This Version