HP Fortran Programmer's Guide (September 2007)

Performance and optimization
Using profilers
Chapter 6140
program
[
program_arguments
]
HP Caliper automatically runs to generate the flow-data information in a file
named flow.data in the current directory.
You can repeat this step multiple times with different
program_arguments
to
create aggregated profile information in the flow.data file to improve your
program’s optimization.
If the flow.data file already exists in the current directory when you run your
program, then HP Caliper merges the results into the file. If you run your program
multiple times in different directories, then HP Caliper creates a separate
flow.data file in each directory. You can combine the files using the fdm(1) utility
program, which is bundled with the HP C, HP aC++, and Fortran 90 compilers.
When collecting PBO data, the more your program use scenarios resemble the ways
in which your program will actually be used, the more the compiler can optimize
your program specifically for how it is actually used.
When you make changes to your source files, you should delete the flow.data file
before collecting more PBO data on your program.
Step 3. Use the flow.data file as input to the compiler when you specify the
+Oprofile=use option on subsequent re-compiles.
/opt/ansic/bin/cc -Aa +O3 -o
program
+Oprofile=use
program.c
The compiler uses the HP Caliper information to help optimize your program.
Note that the benefit of profile-based optimization is application dependent. Some programs
may not improve while others may improve significantly.
Comparing Program Performance
You can use HP Caliper’s other measurement features to explicitly see the results of using
PBO.
An example process for comparing performance would be:
1. Compile your program with +Oprofile=collect to generate the executable to optimize.
2. Run your program to generate the profile data file. Use as many use scenarios as possible
to collect representative profile data.
3. Compile your program with +O3 to generate the baseline executable to compare against
the fully optimized executable.