HP Compilers for HP Integrity Servers (September 2011)

The general optimization strategies heretofore can be applied without loss of floating-point

quality. Here are some additional suggestions:

• Specific optimizations involving math library functions are done only if the source

file includes the math headers, such as <math.h>, that declare the function.

• The compiler optimizes even under controls for special floating-point semantics (see

“Precise floating-point control” (page 21)); however, these controls do restrict

optimization and may degrade the performance of code that does not require the

behavior provided. For best performance, use #pragma STDC FENV_ACCESS

ON in the smallest blocks that enclose the code for which it is needed, rather than

using the compile option +Ofenvaccess for the entire compilation unit. Similarly

use #pragma STDC FP_CONTRACT OFF in the smallest sensitive blocks and

compile with +Ofltacc=default, rather than compiling with +Ofltacc=strict.

• +Olibmerrno is best used only if the compilation unit requires the math functions

to set errno. Consider querying exception flags instead of errno.

• The -l:libm.a option will link in an archive version of libm and result in more

efficient calling sequences. (Using the -Wl,-a,archive_shared option when

linking will have a similar effect, but may cause the linker to select other archive

libraries where shared libraries may be preferred.)

The following techniques can provide significant performance gains, but can degrade

the application’s ability to deal with unusual or unexpected inputs. They are best suited

to performance-hungry applications that are known to run correctly on systems with a

relaxed floating-point model or that can be well tested.

• +Ofltacc=limited is appropriate when the application does not depend on a

specific treatment of infinities, NaNs, or the sign of zero. This option will not

substantially improve performance of most codes.

• +Ocxlimitedrange or #pragma STDC CX_LIMITED_RANGE ON are

appropriate in a C application which uses complex multiply, divide, or a cabs()

function and the textbook formulas (without special consideration for overflow,

underflow, or infinite values) for these operations are acceptable. In HP-UX, the

float and double complex operations are implemented using wider internal

range and precision, alleviating problems of premature overflow or underflow.

• +Ofltacc=relaxed can provide a performance gain over +Ofltacc=limited

when the application meets the criteria for +Ofltacc=limited, the application

is known to run correctly with looser floating-point models, and reproducibility of

low order result bits is not essential.

Application tuning 29