HP Caliper User's Guide

can process only a subset of data cache misses. The PMU randomizes which loads it
monitors.
This means that the number of data cache misses observed through sampling—number
of sampled misses multiplied by sampling rate—is only a subset of the total number
of actual data cache misses. Therefore, it is best to interpret sampling data not as an
indication of how many data cache misses a particular instruction incurred, but, instead,
as an indication of which instructions incur the most data cache misses.
You can potentially get a rough estimate of the total number of data cache misses
incurred by a particular instruction, for example, by doing the following:
1. Determine a scaling factor based on total misses and number of misses accounted
for by sampling:
scale = total L1 misses / (total sampled misses * sampling rate)
2. Multiply the number of sampled misses associated with an instruction by the
scaling factor:
total misses for instruction = scale * sampled misses for instruction
However, depending on the density of floating-point load misses incurred by your
application, such estimates could be very misleading.
Floating-point loads are serviced directly from the L2 cache. The PMU treats both L1
data cache misses and L2 floating-point load misses as data cache miss events for
sampling purposes. Therefore, if your application makes frequent floating-point loads,
then multiplying total samples by sampling rate might yield a data cache miss count
that exceeds the total number of L1 data cache misses.
More frequent sampling increases HP Caliper's perturbation of your application. In
the extreme case of taking one sample for each cache miss event, the kernel will trap
on every event, making the resulting data of limited, if any, value.
How Latency Bucket Metrics Are Obtained
The PMU's data event address register (D-EAR) provides the number of cycles of
latency for each sampled miss. HP Caliper places a data cache miss into one of the
latency buckets based on the latency of the miss. HP Caliper uses its built-in table of
expected latencies to determine whether a miss is serviced by the L2 cache, L3 cache,
cell local memory, C2C, 1–hop memory, 2–hop memory, and so forth. HP Caliper uses
different expected latencies depending on the CPU type, CPU frequency, and system
model.
How the Data Summary Information Is Obtained
The PMU's data event address register (D-EAR) provides the data address along with
the number of cycles of latency for each sampled data cache miss. HP Caliper creates
a histogram of samples by data addresses, by aggregating all samples falling into the
same data address. After creating such a histogram, the data addresses are mapped to
global variables. All samples whose data addresses belong to the same global variable
are aggregated. If a data address does not belong to any global variable, it is assigned
258 Descriptions of Measurement Reports