HP Compilers for HP Integrity Servers (September 2011)
The general optimization strategies heretofore can be applied without loss of floating-point
quality. Here are some additional suggestions:
• Specific optimizations involving math library functions are done only if the source
file includes the math headers, such as <math.h>, that declare the function.
• The compiler optimizes even under controls for special floating-point semantics (see
“Precise floating-point control” (page 21)); however, these controls do restrict
optimization and may degrade the performance of code that does not require the
behavior provided. For best performance, use #pragma STDC FENV_ACCESS
ON in the smallest blocks that enclose the code for which it is needed, rather than
using the compile option +Ofenvaccess for the entire compilation unit. Similarly
use #pragma STDC FP_CONTRACT OFF in the smallest sensitive blocks and
compile with +Ofltacc=default, rather than compiling with +Ofltacc=strict.
• +Olibmerrno is best used only if the compilation unit requires the math functions
to set errno. Consider querying exception flags instead of errno.
• The -l:libm.a option will link in an archive version of libm and result in more
efficient calling sequences. (Using the -Wl,-a,archive_shared option when
linking will have a similar effect, but may cause the linker to select other archive
libraries where shared libraries may be preferred.)
The following techniques can provide significant performance gains, but can degrade
the application’s ability to deal with unusual or unexpected inputs. They are best suited
to performance-hungry applications that are known to run correctly on systems with a
relaxed floating-point model or that can be well tested.
• +Ofltacc=limited is appropriate when the application does not depend on a
specific treatment of infinities, NaNs, or the sign of zero. This option will not
substantially improve performance of most codes.
• +Ocxlimitedrange or #pragma STDC CX_LIMITED_RANGE ON are
appropriate in a C application which uses complex multiply, divide, or a cabs()
function and the textbook formulas (without special consideration for overflow,
underflow, or infinite values) for these operations are acceptable. In HP-UX, the
float and double complex operations are implemented using wider internal
range and precision, alleviating problems of premature overflow or underflow.
• +Ofltacc=relaxed can provide a performance gain over +Ofltacc=limited
when the application meets the criteria for +Ofltacc=limited, the application
is known to run correctly with looser floating-point models, and reproducibility of
low order result bits is not essential.
Application tuning 29