Optimizing Itanium-Based Applications (May 2011)

Optimizing Itanium-Based Applications
11
Users can remove optimization time restrictions at +O2 and above by using the +Onolimit or
+Olimit=none option. This allows full optimization of large procedures, but can incur significant
compile time increases for very large procedures, especially those with large sequences of straight-line
code. If you are willing to tolerate longer compile times, +Onolimit can result in significant
performance improvements.
Users can limit the amount of time spent optimizing code to completely avoid non-linear compile times
using +Olimit or +Olimit=min.
limiting the size of optimized code
+O[no]size (default +Onosize)
The user can disable optimizations that greatly expand code size at +O2 and above using the +Osize
option. Most optimizations improve code speed and simultaneously reduce code size. However, some
optimizations can greatly increase code size. Loop unrolling is one such optimization, and is disabled
with +Osize.
+Osize also disables inlining. This option can help reduce instruction cache misses. You can use
+Osize with other optimization controls, such as +Onolimit and +Ofast.
controlling the scheduling model
+DS[blended|itanium2|montecito|poulson|native] (default +DSblended)
Different Itanium-based implementations can have vastly different resource constraints, latencies, and
other scheduling criteria. The optimizing scheduler can currently schedule for several Itanium-based
implementations: Intel Itanium 2, Montecito, and Poulson. The user can schedule code to run best on
each of these implementations by using +DSitanium2, +DSmontecito,, and +DSpoulson,
respectively. Additionally, use +DSmontecito to obtain the best schedule for Montvale and Tukwila.
Use +DSpoulson to schedule for the future post-Tukwila Poulson implementation.
However, users might also want to optimize code once and have it run reasonably well on different
implementations. The default setting, +DSblended, attempts to do this. Currently, +DSblended is a
combination of +DSmontecito and +DSpoulson. As new IPF implementations are released,
+DSblended scheduling will change so that code will run reasonably well on the different
implementations.
You can simply use +DSnative to schedule code fastest for the implementation on which you are
compiling. As new implementations of Itanium are released, additional targets will be added and the
schedule resulting from +DSblended will change.
controlling floating point optimization
+Ofltacc=[strict|default|limited|relaxed]
+O[no]fltacc
#pragma STDC FP_CONTRACT [ON/OFF/DEFAULT]
Controls optimizations on floating-point code, so that the expected accuracy of floating-point
computation is not violated. With +Ofltacc=strict (or its equivalent +Ofltacc), all
optimizations that can change result values are prohibited.
By default, or with +Ofltacc=default, the only value changing optimization allowed is synthesis
of contractions. This includes floating-point multiply add instructions, and its variants. While these
instructions might change the resulting value over a two-instruction multiply add sequence, the resulting
value is actually more accurate because it has not been subject to an intermediate rounding.