Optimizing Itanium-Based Applications (May 2011)

10
enabling aggressive optimizations
+Ofast or -fast (-fast is not supported by Fortran)
description:
[Alias for +O2 +Onolimit +Ofltacc=relaxed +FPD +DSnative +Wl,+pi,1M
+Wl,+pd,1M –Wl,+mergeseg]
Enables aggressive optimizations at +O2. This option is safe for the vast majority of applications, but
can result in higher compile time or, for codes with strict FP accuracy needs, incorrect output. In
addition to the optimizations performed at +O2, +Ofast performs the following:
Enable optimizations to employ greater compile-time to fully-optimize large procedures,
potentially resulting in non-linear compile times.
Allow additional FP optimizations that might affect the accuracy of floating point values. See
+Onofltacc on page 11 for more information.
Enable the flush-to-zero rounding mode on the hardware.
Aggressively schedule code for the hardware on which the compiler is running. Attempts to
optimize the code for the resources of that specific processor implementation without regard to the
potential performance impact on other Itanium-based implementations.
Enables large 1M byte instruction and data virtual memory page sizes, which can reduce TLB
misses.
Causes the dynamic loader to merge all data segments of shared libraries into one block at startup
time. This allows the kernel to use larger size page table entries which can improve performance.
However, for short-lived applications, this can result in too much overhead at startup, and can be
disabled by adding -Wl,+nomergeseg after +Ofast.
Use +Ofast with stable, well-behaved code that does not rely on FP corner-case values, and that does
not utilize extremely large integer values.
+Ofast might imply +O3 in a future release, rather than +O2.
benefits:
Safely improves performance for most applications, particularly when the application only runs on
the type of system on which it was compiled.
Avoids the need to specify a larger number of optimization flags because it implies a number of
optimizations that are generally safe and can greatly improve application performance.
+Ofaster
description:
[Alias for +Ofast +O4]
Enables interprocedural optimizations in addition to the advanced optimizations described for +Ofast.
See the descriptions of +Ofast and +O4 for more information.
benefits:
Combined benefits from +Ofast and +O4
removing compilation time limits when optimizing
+O[no]limit
+Olimit = [min|default|none]
(Default +Olimit=default]
By default, the optimizer is tuned to spend a reasonable amount of time optimizing large programs at
+O2 and above, to avoid non-linear compile times.