HP-UX Floating-Point Guide

Chapter 7 189
Performance Tuning
Cache Aliasing
Data cache aliasing is a more common performance problem than
instruction cache aliasing, but it is also easier to deal with. If you suspect
that your application is experiencing data cache performance problems
on scalar or small vectors of data, you can usually correct the situation
by rearranging the order in which your variables are allocated in
memory. In Fortran you can do this by reordering common block
assignments.
If your application uses very large arrays of data, hundreds of kilobytes
or megabytes in length, than chances are good that you will experience
data cache aliasing problems no matter how your data arrays are
allocated in memory, simply because the arrays are too big to fit in the
data cache all at once. In this case, you may be able to improve
performance by using the +Odataprefetch option, either alone or in
conjunction with +Ovectorize. (See “Optimizing Your Program” on
page 171 for information about these options.)
Another way to improve performance for very large arrays is to increase
the locality of references to your data. This technique, sometimes called
tiling, requires selecting an algorithm that processes as much data as
possible while the data is resident in the cache so as to minimize the
number of times the data must be re-cached later. The routines in the
BLAS library use tiling when they operate on large matrices of data. For
example, in multiplying two large matrices, each matrix is cut into tiles,
which are processed in pairs. The size of the tiles is determined at run
time by the size of the matrices and the size of the data cache on the
system executing the application.