Uncore Manual
Reference Number: 329468-002 155
Uncore Performance Monitoring
Intel® QPI Link Layer Performance Monitoring
TxL0_POWER_CYCLES
• Title: Cycles in L0
• Category: POWER_TX Events
• Event Code: 0x0c
• Max. Inc/Cyc:. 1, Register Restrictions: 0-3
• Definition: Number of QPI qfclk cycles spent in L0 power mode in the Link Layer. L0 is the default
mode which provides the highest performance with the most power. Use edge detect to count the
number of instances that the link entered L0. Link power states are per link and per direction, so
for example the Tx direction could be in one state while Rx was in another. The phy layer some-
times leaves L0 for training, which will not be captured by this event.
• NOTE: Includes L0p cycles. To get just L0, subtract TxL0P_POWER_CYCLES.
TxL_BYPASSED
• Title: Tx Flit Buffer Bypassed
• Category: TXQ Events
• Event Code: 0x05
• Max. Inc/Cyc:. 1, Register Restrictions: 0-3
• Definition: Counts the number of times that an incoming flit was able to bypass the Tx flit buffer
and pass directly out the QPI Link. Generally, when data is transmitted across QPI, it will bypass
the TxQ and pass directly to the link. However, the TxQ will be used with L0p and when LLR
occurs, increasing latency to transfer out to the link.
TxL_CYCLES_NE
• Title: Tx Flit Buffer Cycles not Empty
• Category: TXQ Events
• Event Code: 0x06
• Max. Inc/Cyc:. 1, Register Restrictions: 0-3
• Definition: Counts the number of cycles when the TxQ is not empty. Generally, when data is
transmitted across QPI, it will bypass the TxQ and pass directly to the link. However, the TxQ will
be used with L0p and when LLR occurs, increasing latency to transfer out to the link.
TxL_FLITS_G0
• Title: Flits Transferred - Group 0
• Category: FLITS_TX Events
• Event Code: 0x00
• Max. Inc/Cyc:. 2, Register Restrictions: 0-3
• Definition: Counts the number of flits transmitted across the QPI Link. It includes filters for Idle,
protocol, and Data Flits. Each “flit” is made up of 80 bits of information (in addition to some ECC
data). In full-width (L0) mode, flits are made up of four “fits”, each of which contains 20 bits of
data (along with some additional ECC data). In half-width (L0p) mode, the fits are only 10 bits,
and therefore it takes twice as many fits to transmit a flit. When one talks about QPI “speed” (for
example, 8.0 GT/s), the “transfers” here refer to “fits”. Therefore, in L0, the system will transfer 1
“flit” at the rate of 1/4th the QPI speed. One can calculate the bandwidth of the link by taking:
flits*80b/time. Note that this is not the same as “data” bandwidth. For example, when we are
transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information
and 8 with 64 bits of actual “data” and an additional 16 bits of other information. To calculate
“data” bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for
L0p.