Optimizing Itanium-Based Applications (May 2011)
22
% ./program.exe < A.input
% mv flow.data A.flow
% ./program.exe < B.input
% mv flow.data B.flow
% /opt/langtools/bin/fdm A.flow A.flow –o /tmp/program.flow
The two sequences above (implicit and explicit) will result in the same final profile, modulo sampling
effects.
locking of profile database files
When an instrumented application completes execution and begins writing to the “flow.data” file to record
its execution profile, it attempts to lock the file in order to obtain exclusive access, This is intended to avoid
cases where two instances of an executable are trying to simultaneously update the same file. Lock files
take the form “<flowfile>,lock” and are written to the same directory that the flow file is written to; the
lock will persist until the flow file has been completely updated.
If an executable is unable to obtain a lock (perhaps due to many processes all trying to update the same
file), it will write to a temporary flow file “flow.XXX” where XXX is a pseudo-random string returned by
the “tempnam() system call. If this happens, users can then merge the resulting temporary files back into a
single database using “fdm”.
Itanium- versus PA-RISC profile-based optimization differences
Although the user model is the same, the underlying implementation of profile-based optimization in an
Itanium compile is substantially different from that in the PA-RISC compilers. When transitioning from
PA-RISC to Itanium, please be aware of the following:
The PA-RISC equivalent of +Oprofile=collect command line option is +I and the PA-
RISC equivalent of the +Oprofile=use is +P; however, the PA-RISC options are honored by
the Itanium compiler.
In the PA-RISC implementation, compiling a module with -c +I or -c +P causes an ISOM
(high-level intermediate) object file to be generated. Actual code generation is postponed until the
final link phase. This is not the case in the Itanium-based implementation where code is generated
(either +I or +P) during the -c compile.
Instrumented applications are optimized less aggressively than non-instrumented executables. The
PA-RISC compiler is capable of optimizing instrumented code at level +O2, whereas with the
current Itanium-based compilers, profile collection is supported at +O1 optimization (a warning is
issued indicating that the optimization level will drop to +O1 internally for +Oprofile=collect
compiles). This restriction may be lifted in a future release, however.
In the PA-RISC +I implementation, profile counters are 32 bits in size. When selecting input data
sets for runs of instrumented executables, counter saturation can occur if the training run is too
lengthy. On Itanium, profile counters are 64 bits in size, meaning that you can use more lengthy
training runs without concerns about counter saturation.