HP-UX Floating-Point Guide

170 Chapter 7
Performance Tuning
Identifying and Removing Performance Bottlenecks
floating-point-intensive code that is running unacceptably slowly. There
are several reasons why a piece of floating-point-intensive code might
consume a lot of execution time:
The code generated by the compiler is inefficient.
The program does not link in the fastest version of the BLAS library.
The program links in shared libraries, which are slower than archive
libraries.
The data being processed involves denormalized operands or
underflowing operations.
The data being processed contains mixed-precision expressions.
The code contains highly iterative loops (for example, vector and/or
matrix operations). Or the code contains loops that perform vector
and/or matrix operations, but the loops are not being vectorized.
The data is not optimally aligned in memory.
The program causes adverse cache aliasing effects.
The code contains many static variables.
The code performs quad-precision computations.
The following sections discuss each of these problems individually.