Uncore Manual

ManualsBrandsIntel ManualsOtherIntel Pentium Processor 3560M

161

162

163

164

165

166

167

168

169

170

Reference Number: 329468-002 155

Uncore Performance Monitoring

Intel® QPI Link Layer Performance Monitoring

TxL0_POWER_CYCLES

• Title: Cycles in L0

• Category: POWER_TX Events

• Event Code: 0x0c

• Max. Inc/Cyc:. 1, Register Restrictions: 0-3

• Definition: Number of QPI qfclk cycles spent in L0 power mode in the Link Layer. L0 is the default

mode which provides the highest performance with the most power. Use edge detect to count the

number of instances that the link entered L0. Link power states are per link and per direction, so

for example the Tx direction could be in one state while Rx was in another. The phy layer some-

times leaves L0 for training, which will not be captured by this event.

• NOTE: Includes L0p cycles. To get just L0, subtract TxL0P_POWER_CYCLES.

TxL_BYPASSED

• Title: Tx Flit Buffer Bypassed

• Category: TXQ Events

• Event Code: 0x05

• Max. Inc/Cyc:. 1, Register Restrictions: 0-3

• Definition: Counts the number of times that an incoming flit was able to bypass the Tx flit buffer

and pass directly out the QPI Link. Generally, when data is transmitted across QPI, it will bypass

the TxQ and pass directly to the link. However, the TxQ will be used with L0p and when LLR

occurs, increasing latency to transfer out to the link.

TxL_CYCLES_NE

• Title: Tx Flit Buffer Cycles not Empty

• Category: TXQ Events

• Event Code: 0x06

• Max. Inc/Cyc:. 1, Register Restrictions: 0-3

• Definition: Counts the number of cycles when the TxQ is not empty. Generally, when data is

transmitted across QPI, it will bypass the TxQ and pass directly to the link. However, the TxQ will

be used with L0p and when LLR occurs, increasing latency to transfer out to the link.

TxL_FLITS_G0

• Title: Flits Transferred - Group 0

• Category: FLITS_TX Events

• Event Code: 0x00

• Max. Inc/Cyc:. 2, Register Restrictions: 0-3

• Definition: Counts the number of flits transmitted across the QPI Link. It includes filters for Idle,

protocol, and Data Flits. Each “flit” is made up of 80 bits of information (in addition to some ECC

data). In full-width (L0) mode, flits are made up of four “fits”, each of which contains 20 bits of

data (along with some additional ECC data). In half-width (L0p) mode, the fits are only 10 bits,

and therefore it takes twice as many fits to transmit a flit. When one talks about QPI “speed” (for

example, 8.0 GT/s), the “transfers” here refer to “fits”. Therefore, in L0, the system will transfer 1

“flit” at the rate of 1/4th the QPI speed. One can calculate the bandwidth of the link by taking:

flits*80b/time. Note that this is not the same as “data” bandwidth. For example, when we are

transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information

and 8 with 64 bits of actual “data” and an additional 16 bits of other information. To calculate

“data” bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for

L0p.