User manual

ManualsBrandsIntel ManualsOtherPersonal Computer IXP2800

101

102

103

104

105

106

107

108

109

110

Table Of Contents

Intel® IXP2800 Network Processor

110 Hardware Reference Manual

Intel

IXP2800 Network Processor

Intel XScale

Core

Statistics derived from these two events:

• The average number of cycles the processor stalled on a data-cache access that may overflow

the data-cache buffers.

This is calculated by dividing PMN0 by PMN1. This statistic lets you know if the duration

event cycles are due to many requests or are attributed to just a few requests. If the average is

high, the Intel XScale

core may be starved of the bus external to the Intel XScale

core.

• The percentage of total execution cycles the processor stalled because a Data Cache request

buffer was not available.

This is calculated by dividing PMN0 by CCNT, which was used to measure total execution

time.

3.8.1.5 Stall/Writeback Statistics

When an instruction requires the result of a previous instruction and that result is not yet available,

the Intel XScale

core stalls, to preserve the correct data dependencies. PMN0 counts the number

of stall cycles due to data dependencies. Not all data dependencies cause a stall; only the following

dependencies cause such a stall penalty:

• Load-use penalty: attempting to use the result of a load before the load completes. To avoid the

penalty, software should delay using the result of a load until it’s available. This penalty shows

the latency effect of data-cache access.

• Multiply/Accumulate-use penalty: attempting to use the result of a multiply or multiply-

accumulate operation before the operation completes. Again, to avoid the penalty, software

should delay using the result until it’s available.

• ALU use penalty: there are a few isolated cases where back-to-back ALU operations may

result in one cycle delay in the execution.

PMN1 counts the number of writeback operations emitted by the data cache. These writebacks

occur when the data cache evicts a dirty line of data to make room for a newly requested line or as

the result of clean operation (CP15, register 7).

Statistics derived from these two events:

• The percentage of total execution cycles the processor stalled because of a data dependency.

This is calculated by dividing PMN0 by CCNT, which was used to measure total execution

time. Often, a compiler can reschedule code to avoid these penalties when given the right

optimization switches.

• Total number of data writeback requests to external memory can be derived solely with PMN1.