HP-UX Floating-Point Guide

170 Chapter 7

Performance Tuning

Identifying and Removing Performance Bottlenecks

ﬂoating-point-intensive code that is running unacceptably slowly. There

are several reasons why a piece of ﬂoating-point-intensive code might

consume a lot of execution time:

• The code generated by the compiler is inefﬁcient.

• The program does not link in the fastest version of the BLAS library.

• The program links in shared libraries, which are slower than archive

libraries.

• The data being processed involves denormalized operands or

underﬂowing operations.

• The data being processed contains mixed-precision expressions.

• The code contains highly iterative loops (for example, vector and/or

matrix operations). Or the code contains loops that perform vector

and/or matrix operations, but the loops are not being vectorized.

• The data is not optimally aligned in memory.

• The program causes adverse cache aliasing effects.

• The code contains many static variables.

• The code performs quad-precision computations.

The following sections discuss each of these problems individually.