HP Caliper User's Guide

For control-dominated code or for workloads that seldom miss the internal caches,
this value will be very small. For data-flow-type workloads, this number can, if
extensive prefetching is employed, be quite high, up to a maximum of 16, which
is the Itanium 2 bus limit.
The reported average latency value will be incorrect on Itanium 2 steppings earlier
than B2.
CPU
CPU transaction component is a measure of the percentage of all bus transactions
generated by all CPUs on a shared front side bus (FSB).
I/O
I/O transaction component is a measure of the percentage of all bus transactions
initiated by any I/O agent on a shared FSB.
Util Adrs
Average address bus utilization gives an estimate of total address bus utilization
resulting from all bus transactions to include cache misses, I/O port reads/writes,
interprocessor interrupts, writebacks, cache line invalidates (FC instruction, store
hit on shared line), and clean castouts (if enabled). The utilization is computed as
follows:
ADRS UTIL = 100.0 * (total transactions/sec * 3.0) / bus
cycles/sec
The constant value (3.0) is the number of address cycles needed for each bus
transaction.
Util Data
Data bus utilization gives a lower bound estimate of total data bus utilization
resulting from bus transactions that result in a data transfer, that is, BRL, BRIL,
BWL, and nonzero byte BRP/BWP transactions. A lower bound data bus utilization
is computed as follows:
DATA BUS CYCLES/SEC = ((BRL + BRIL + BWL + IMPLICIT WB)/sec
* 4.0) +
((nonzero byte BRP's/BWP's)/sec * 1.0)
DATA UTIL = 100 * (DATA BUS CYCLES/SEC) / BUS CYCLES SEC
The constants (4.0 and 1.0) represent the number of cycles that the data bus is
occupied to perform the requisite data transfer. All cache line transfers (brl, bril,
bwl) require four cycles. The nonzero BRP's/BWP's require one or two cycles (16,
32, 64 bytes). Since most of the nonzero BRP's/BWP's are to I/O ports and
semaphores, it was decided to assume a single-cycle transfer. Thus, there is a small
possibility of undercounting cycles.
BRL
Bus Read Line is the transaction used to read cache lines, due either to an instruction
cache miss or to a load data miss.
342 Event Set Descriptions for CPU Metrics