HP-UX Floating-Point Guide
170 Chapter 7
Performance Tuning
Identifying and Removing Performance Bottlenecks
floating-point-intensive code that is running unacceptably slowly. There
are several reasons why a piece of floating-point-intensive code might
consume a lot of execution time:
• The code generated by the compiler is inefficient.
• The program does not link in the fastest version of the BLAS library.
• The program links in shared libraries, which are slower than archive
libraries.
• The data being processed involves denormalized operands or
underflowing operations.
• The data being processed contains mixed-precision expressions.
• The code contains highly iterative loops (for example, vector and/or
matrix operations). Or the code contains loops that perform vector
and/or matrix operations, but the loops are not being vectorized.
• The data is not optimally aligned in memory.
• The program causes adverse cache aliasing effects.
• The code contains many static variables.
• The code performs quad-precision computations.
The following sections discuss each of these problems individually.