HP aC++/HP C Programmer's Guide (B3901-90036; A.06.26; September 2011)

+Oprefetch_latency

+Oprefetch_latency=cycles

The +Oprefetch_latency option applies to loops for which the compiler generates

data prefetch instructions. cycles represents the number of cycles for a data cache

miss. For a given loop, the compiler divides cycles by the estimated loop length to

arrive at the number of loop iterations for which to generate advanced prefetches.

cycles must be in the range of 0 to 10000. A value of 0 instructs the compiler to use

the default value, which is 480 cycles for loops containing floating-point accesses and

150 cycles for loops that do not contain any floating-point accesses.

For tuning purposes, it is recommended that users measure their application’s performance

using a few different prefetch latency settings to determine the optimal value. Some

floating-point codes may benefit by increasing the distance to 960. Parallel applications

frequently benefit from a shorter prefetch distance of 150.

+O[no]preserved_fpregs

The +O[no]preserved_fprefs option specifies whether the compiler is allowed [not

allowed] to make use of the preserved subset of the floating-point register file as defined

by the Itanium runtime architecture.

The default is +Opreserved_fpregs.

+O[no]rotating_fpregs

The +O[no]rotating_fpregs option specifies whether the compiler is allowed [not

allowed] to make use of the rotating subset of the floating-point register file.

The default is +Orotating_fpregs.

+O[no]sumreduction

This option enables [disables] sum reduction optimization. It allows the compiler to

compute partial sums to allow faster computations. It is not technically legal to do this in

C or C++ because of floating-point accuracy issues. This option is useful if an application

cannot use the +Onofltacc option but wants sum reduction to be performed.

When sum reduction optimization is enabled, the compiler may evaluate intermediate

partial sums of float or double precision terms using (wider) extended precision, which

reduces variation in the result caused by different optimization strategies and generally

produces a more accurate result.

Floating-Point Processing Options 51