Optimizing Itanium-Based Applications (May 2011)
20
When the +Oautopar option is used at optimization levels +O3 and above, the compiler will
automatically parallelize those loops which are deemed safe and profitable by the loop transformer.
This optimization allows the compiled program to take advantage of more than one processor (or core)
when executing loops determined to be parallelizable. Most programs which spend a significant
percentage of their execution time in such loops will improve their performance by using this technique
– occasionally dramatically. By contrast, some programs may experience performance degradations
when parallelized, and all parallelized programs will increase their use of system resources, which may
slow down other programs running alongside them.
profile-based optimization
Profile-based optimization (PBO) is a set of performance-improving code transformations that make use of
an execution profile gathered for an application. There are three steps involved in performing this
optimization:
1. Instrumentation – Recompile the program to prepare it for execution profile collection.
2. Data Collection – Run the program with representative data to collect execution profile statistics.
3. Optimization – Generate optimized code based on the profile data.
Invoke profile-based optimization by using the HP compiler +Oprofile=collect and
+Oprofile=use command line options, as described below.
instrumenting the code
To instrument your program, use the +Oprofile=collect option as follows:
cc -Aa +Oprofile=collect -c sample.c Compile for instrumentation.
cc -o sample.exe +Oprofile=collect sample.o Link to make instrumented executable.
The first command line compiles the code; the +Oprofile=collect option requests that the compiler
prepare the module for profile collection. The -c option in the first command line suppresses linking and
creates an intermediate object file called sample.o. The second command line uses the -o option to link
sample.o into sample.exe. The +Oprofile=collect option prepares sample.exe with data
collection code.
Note: Instrumented programs run slower than non-instrumented programs. Use instrumented code only
to collect statistics for profile-based optimization.
collecting execution profile data
To collect an execution profile for your application, run the instrumented program with representative data
as follows:
sample.exe < input.file1 Collect execution profile data.
sample.exe < input.file2
This step creates and logs the profile statistics to a file, by default called flow.data. You can use this
data collection file to store the statistics from multiple test runs of different programs that you might have
instrumented.
performing profile-based optimization
To optimize the program based on the previously collected run-time profile statistics, recompile the
program as follows:
cc -Aa +Oprofile=use -O -c sample.c Compile for optimization.