Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Developer’s Manual March, 2003 Order Number: 273411-003
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Contents 1 1.1 1.2 1.3 2 2.1 2.2 2.3 3 3.1 3.2 Introduction .............................................................................................. 1 ® Intel 80200 Processor based on Intel® XScale™ Microarchitecture High-Level Overview ......... 1 1.1.1 ARM* Architecture Compliance ................................................................................... 1 1.1.2 Features.................................................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 3.3 3.4 4 4.1 4.2 4.3 5 5.1 5.2 6 6.1 6.2 iv 3.2.2.1 Page (P) Attribute Bit ................................................................................ 2 3.2.2.2 Cacheable (C), Bufferable (B), and eXtension (X) Bits ............................ 2 3.2.2.3 Instruction Cache...................................................................................... 2 3.2.2.4 Data Cache and Write Buffer.....................................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 6.3 6.4 6.5 7 7.1 7.2 7.3 8 8.1 8.2 8.3 9 9.1 9.2 6.2.3.3 Write Miss Policy ...................................................................................... 7 6.2.3.4 Write-Back Versus Write-Through ............................................................ 7 6.2.4 Round-Robin Replacement Algorithm ......................................................................... 8 6.2.5 Parity Protection ...............................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 9.3 Programmer Model ....................................................................................................................... 2 9.3.1 INTCTL ........................................................................................................................ 3 9.3.2 INTSRC ....................................................................................................................... 4 9.3.3 INTSTR...................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 12.6 12.7 13 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11 12.5.3 Instruction Fetch Latency Mode................................................................................... 8 12.5.4 Data/Bus Request Buffer Full Mode ............................................................................ 9 12.5.5 Stall/Writeback Statistics ............................................................................................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 13.12 13.13 13.14 13.15 13.16 14 14.1 14.2 14.3 14.4 viii 13.11.6.4 DBG.V .................................................................................................... 25 13.11.6.5 DBG.RX .................................................................................................. 25 13.11.6.6 DBG.D .................................................................................................... 25 13.11.6.7 DBG.FLUSH ..
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 14.4.10 14.4.11 A A.1 A.2 A.3 B B.1 B.2 B.3 B.4 Miscellaneous Instruction Timing................................................................................. 9 Thumb* Instructions ..................................................................................................... 9 Compatibility: Intel® 80200 Processor vs. SA-110................................ 1 Introduction ........................................................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture B.4.1 B.5 B.6 B.7 C C.1 C.2 x Instruction Cache........................................................................................................... 17 B.4.1.1. Cache Miss Cost............................................................................................... 17 B.4.1.2. Round Robin Replacement Cache Policy......................................................... 17 B.4.1.3. Code Placement to Reduce Cache Misses .......
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture C.2.2 C.2.3 C.2.4 C.2.5 TAP Pins.......................................................................................................................... 3 Instruction Register (IR)................................................................................................... 4 C.2.3.1.Boundary-Scan Instruction Set ........................................................................... 4 TAP Test Data Registers ....................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Figures 1-1 3-1 4-1 4-2 5-1 5-2 6-1 6-2 6-3 8-1 8-2 9-1 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 10-10 10-11 10-12 10-13 10-14 13-1 13-2 13-3 13-4 13-5 13-6 13-7 13-8 13-9 13-10 13-11 13-12 13-13 13-14 B-1 C-1 C-2 C-3 C-4 C-5 xii Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Features ........................................... 2 Example of Locked Entries in TLB..........................................................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Tables 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-9 2-10 2-8 2-11 2-12 2-13 2-14 3-1 3-2 3-3 3-4 7-1 7-2 7-3 7-4 7-5 7-6 7-7 7-8 7-9 7-10 7-11 7-12 7-13 7-14 7-15 7-16 7-17 7-18 7-19 7-20 7-21 7-22 7-23 7-24 7-25 7-26 8-1 8-2 8-3 8-4 Multiply with Internal Accumulate Format...................................................................................................4 MIA{} acc0, Rm, Rs...........................................................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 9-1 9-2 9-3 10-1 10-2 10-3 10-4 10-5 11-1 11-2 11-3 11-4 11-5 11-6 12-1 12-2 12-3 12-4 12-5 13-1 13-2 13-3 13-4 13-5 13-6 13-7 13-8 13-9 13-10 13-11 13-12 13-13 13-14 13-15 13-16 13-17 13-18 14-1 14-2 14-3 14-4 14-5 14-6 14-7 14-8 14-9 14-10 14-11 14-12 14-13 xiv Interrupt Control Register (CP13 register 0) ................................................................................................
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture 14-14 14-15 14-16 14-17 14-18 A-1 B-1 C-1 C-2 C-3 C-4 Semaphore Instruction Timings ....................................................................................................................9 CP15 Register Access Instruction Timings...................................................................................................9 CP14 Register Access Instruction Timings..............................................................
1 Introduction 1.1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture High-Level Overview The Intel® 80200 processor based on Intel® XScale™ microarchitecture, is the next generation in the Intel® StrongARM* processor family (compliant with ARM* Architecture V5TE). It is designed for high performance and low-power; leading the industry in mW/MIPs.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Introduction 1.1.2 Features Figure 1-1 shows the major functional blocks of the Intel® 80200 processor. The following sections give a brief, high-level overview of these blocks. Figure 1-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Introduction 1.1.2.2 Memory Management The Intel® 80200 processor implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. The MMU provides access protection and virtual to physical address translation. The MMU Architecture also specifies the caching policies for the instruction cache and data memory.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Introduction 1.1.2.6 Power Management The Intel® 80200 processor supports two low power modes: idle and sleep. These modes are discussed in Section 8.3, “Power Management” on page 8-5. 1.1.2.7 Interrupt Controller An interrupt controller is implemented on the Intel® 80200 processor that provides masking of interrupts and the ability to steer interrupts to FIQ or IRQ. It is accessed through Coprocessor 13 registers.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Introduction 1.2 Terminology and Conventions 1.2.1 Number Representation All numbers in this document can be assumed to be base 10 unless designated otherwise. In text and pseudo code descriptions, hexadecimal numbers have a prefix of 0x and binary numbers have a prefix of 0b. For example, 107 would be represented as 0x6B in hexadecimal and 0b1101011 in binary. 1.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Introduction 1.3 Other Relevant Documents • Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Datasheet, Intel Order # 273414 • ARM Architecture Version 5TE Specification Document Number: ARM DDI 0100E This document describes Version 5TE of the ARM Architecture which includes Thumb ISA and ARM DSP-Enhanced ISA.
Programming Model 2 This chapter describes the programming model of the Intel® 80200 processor based on Intel® XScale™ microarchitecture, namely the implementation options and extensions to the ARM* Version 5 architecture. The ARM* Architecture Version 5TE Specification (ARM DDI 0100E) describes Version 5TE of the ARM Architecture, including the Thumb* ISA and ARM DSP-Enhanced ISA. 2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.2.4 ARM* DSP-Enhanced Instruction Set The Intel® 80200 processor implements ARM DSP-enhanced instruction set, which is a set of instructions that boost the performance of signal processing applications. There are new multiply instructions that operate on 16-bit data values and new saturation instructions.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3 Extensions to ARM* Architecture The Intel® 80200 processor made a few extensions to the ARM Version 5 architecture to meet the needs of various markets and design requirements. The following is a list of the extensions which are discussed in the next sections. • A DSP coprocessor (CP0) has been added that contains a 40-bit accumulator and new instructions.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3.1.1 Multiply With Internal Accumulate Format A new multiply format has been created to define operations on 40-bit accumulators. Table 2-1, “Multiply with Internal Accumulate Format” on page 2-4 shows the layout of the new format. The opcode for this format lies within the coprocessor register transfer instruction type. These instructions have their own syntax. Table 2-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model MIA does not support unsigned multiplication; all values in Rs and Rm are interpreted as signed data values. MIA is useful for operating on signed 16-bit data that was loaded into a general purpose register by LDRSH. The instruction is only executed if the condition specified in the instruction matches the condition code status. Table 2-3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model Table 2-4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3.1.2 Internal Accumulator Access Format The Intel® 80200 processor defines a new instruction format for accessing internal accumulators in CP0. Table 2-5, “Internal Accumulator Access Format” on page 2-7 shows that the opcode falls into the coprocessor register transfer space. The RdHi and RdLo fields allow up to 64 bits of data transfer between Intel® StrongARM* registers and an internal accumulator.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model Table 2-6.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3.2 New Page Attributes The Intel® 80200 processor extends the page attributes defined by the C and B bits in the page descriptors with an additional X bit. This bit allows four more attributes to be encoded when X=1. These new encodings include allocating data for the mini-data cache and write-allocate caching. A full description of the encodings can be found in Section 3.2.2, “Memory Attributes” on page 3-2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model Table 2-8. First-level Descriptors 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 SBZ SBZ TEX Fine page table base address Table 2-9.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3.3 Additions to CP15 Functionality To accommodate the functionality in the Intel® 80200 processor, registers in CP15 and CP14 have been added or augmented. See Chapter 7, “Configuration” for details. At times it is necessary to be able to guarantee exactly when a CP15 update takes effect.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3.4 Event Architecture 2.3.4.1 Exception Summary Table 2-11 shows all the exceptions that the Intel® 80200 processor may generate, and the attributes of each. Subsequent sections give details on each exception. Table 2-11. Exception Summary a. b. 2.3.4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3.4.3 Prefetch Aborts The Intel® 80200 processor detects three types of prefetch aborts: Instruction MMU abort, external abort on an instruction access, and an instruction cache parity error. These aborts are described in Table 2-13. When a prefetch abort occurs, hardware reports the highest priority one in the extended Status field of the Fault Status Register.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model 2.3.4.4 Data Aborts Two types of data aborts exist in the Intel® 80200 processor: precise and imprecise. A precise data abort is defined as one where R14_ABORT always contains the PC (+8) of the instruction that caused the exception. An imprecise abort is one where R14_ABORT contains the PC (+4) of the next instruction to execute and not the address of the instruction that caused the abort.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model Imprecise data aborts • A data cache parity error is imprecise; the extended Status field of the Fault Status Register is set to 0xb11000. • All external data aborts except for those generated on a data MMU translation are imprecise. The Fault Address Register for all imprecise data aborts is undefined and R14_ABORT is the address of the next instruction to execute + 4, which is the same for both ARM and Thumb mode.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Programming Model Multiple Data Aborts Multiple data aborts may be detected by hardware, but only the highest priority one is reported. If the reported data abort is precise, software can correct the cause of the abort and re-execute the aborted instruction. If the lower priority abort still exists, it is reported. Software can handle each abort separately until the instruction successfully executes.
Memory Management 3 This chapter describes the memory management unit implemented in the Intel® 80200 processor based on Intel® XScale™ microarchitecture, and is compliant with the ARM* Architecture V5TE. 3.1 Overview The Intel® 80200 processor implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management 3.2 Architecture Model 3.2.1 Version 4 vs. Version 5 ARM* MMU Version 5 Architecture introduces the support of tiny pages, which are 1 KByte in size. The reserved field in the first-level descriptor (encoding 0b11) is used as the fine page table base address. The exact bit fields and the format of the first and second-level descriptors can be found in Section 2.3.2, “New Page Attributes” on page 2-9. 3.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management 3.2.2.4 Data Cache and Write Buffer All of these descriptor bits affect the behavior of the Data Cache and the Write Buffer. If the X bit for a descriptor is zero, the C and B bits operate as mandated by the ARM architecture. This behavior is detailed in Table 3-1. If the X bit for a descriptor is one, the C and B bits’ meaning is extended, as detailed in Table 3-2. Table 3-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management 3.2.2.5 Details on Data Cache and Write Buffer Behavior If the MMU is disabled all data accesses are non-cacheable and non-bufferable. This is the same behavior as when the MMU is enabled, and a data access uses a descriptor with X, C, and B all set to 0. The X, C, and B bits determine when the processor should place new data into the Data Cache. The cache places data into the cache in lines (also called blocks).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management 3.3 Interaction of the MMU, Instruction Cache, and Data Cache The MMU, instruction cache, and data/mini-data cache may be enabled/disabled independently. The instruction cache can be enabled with the MMU enabled or disabled. However, the data cache can only be enabled when the MMU is enabled. Therefore only three of the four combinations of the MMU and data/mini-data cache enables are valid.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management 3.4 Control 3.4.1 Invalidate (Flush) Operation The entire instruction and data TLB can be invalidated at the same time with one command or they can be invalidated separately. An individual entry in the data or instruction TLB can also be invalidated. See Table 7-13, “TLB Functions” on page 7-13 for a listing of commands supported by the Intel® 80200 processor.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management 3.4.3 Locking Entries Individual entries can be locked into the instruction and data TLBs. See Table 7-14, “Cache Lockdown Functions” on page 7-14 for the exact commands. If a lock operation finds the virtual address translation already resident in the TLB, the results are unpredictable. An invalidate by entry command before the lock command ensures proper operation.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management Example 3-3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Memory Management 3.4.4 Round-Robin Replacement Algorithm The line replacement algorithm for the TLBs is round-robin; there is a round-robin pointer that keeps track of the next entry to replace. The next entry to replace is the one sequentially after the last entry that was written. For example, if the last virtual to physical address translation was written into entry 5, the next entry to replace is entry 6.
4 Instruction Cache The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) instruction cache enhances performance by reducing the number of instruction fetches from external memory. The cache provides fast execution of cached code. Code can also be locked down when guaranteed or fast access time is required. 4.1 Overview Figure 4-1 shows the cache organization and how the instruction address is used to access the cache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache 4.2 Operation 4.2.1 Operation When Instruction Cache is Enabled When the cache is enabled, it compares every instruction request address against the addresses of instructions that it is currently holding. If the cache contains the requested instruction, the access “hits” the cache, and the cache returns the requested instruction.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache 4.2.3 Fetch Policy An instruction-cache “miss” occurs when the requested instruction is not found in the instruction fetch buffers or instruction cache; a fetch request is then made to external memory. The instruction cache can handle up to two “misses.” Each external fetch request uses a fetch buffer that holds 32-bytes and eight valid bits, one for each word. A miss causes the following: 1. A fetch buffer is allocated 2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache 4.2.5 Parity Protection The instruction cache is protected by parity to ensure data integrity. Each instruction cache word has 1 parity bit. (The instruction cache tag is NOT parity protected.) When a parity error is detected on an instruction cache access, a prefetch abort exception occurs if the Intel® 80200 processor attempts to execute the instruction.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache 4.2.6 Instruction Fetch Latency Because the Intel® 80200 processor core is clocked at a multiple of the external bus clock, and the two clocks are truly asynchronous, an exact fetch latency is difficult to derive. In general, if a fetch can be directly issued (no other memory accesses are intervening), then the delay to the first instruction is approximately (8 + W) bus clocks, where W is number of memory wait states.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache 4.3 Instruction Cache Control 4.3.1 Instruction Cache State at RESET After reset, the instruction cache is always disabled, unlocked, and invalidated (flushed). 4.3.2 Enabling/Disabling The instruction cache is enabled by setting bit 12 in coprocessor 15, register 1 (Control Register). This process is illustrated in Example 4-2, Enabling the Instruction Cache. Example 4-2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache 4.3.3 Invalidating the Instruction Cache The entire instruction cache along with the fetch buffers are invalidated by writing to coprocessor 15, register 7. (See Table 7-12, “Cache Functions” on page 7-11 for the exact command.) This command does not unlock any lines that were locked in the instruction cache nor does it invalidate those locked lines.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache 4.3.4 Locking Instructions in the Instruction Cache Software has the ability to lock performance critical routines into the instruction cache. Up to 28 lines in each set can be locked; hardware ignores the lock command if software is trying to lock all the lines in a particular set (i.e., ways 28-31can never be locked). When this happens, the line is still allocated into the cache, but the lock is ignored.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Instruction Cache Software can lock down several different routines located at different memory locations. This may cause some sets to have more locked lines than others as shown in Figure 4-2. Example 4-4 on page 4-9 shows how a routine, called “lockMe” in this example, might be locked into the instruction cache. Note that it is possible to receive an exception while locking code (see Section 2.3.4, “Event Architecture” on page 2-12).
5 Branch Target Buffer Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) uses dynamic branch prediction to reduce the penalties associated with changing the flow of program execution. The Intel® 80200 processor features a branch target buffer that provides the instruction cache with the target address of branch type instructions. The branch target buffer is implemented as a 128-entry, direct mapped cache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Branch Target Buffer Figure 5-2. Branch History Taken Taken Taken WN SN WT ST Not Taken Taken Not Taken Not Taken Not Taken SN: Strongly Not Taken WN: Weakly Not Taken 5.1.1 ST: Strongly Taken WT: Weakly Taken Reset After Processor Reset, the BTB is disabled and all entries are invalidated. 5.1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Branch Target Buffer 5.2 BTB Control 5.2.1 Disabling/Enabling The BTB is always disabled out of Reset. Software can enable the BTB through a bit in a coprocessor register (see Section 7.2.2). Before enabling or disabling the BTB, software must invalidate it (described in the following section). This action ensures correct operation in case stale data is in the BTB.
6 Data Cache The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) data cache enhances performance by reducing the number of data accesses to and from external memory. There are two data cache structures in the Intel® 80200 processor, a 32 Kbyte data cache and a 2 Kbyte mini-data cache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache Figure 6-1. Data Cache Organization Set 31 way 0 way 1 32 bytes (cache line) Set Index This example shows Set 0 being selected by the set index.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.1.2 Mini-Data Cache Overview The mini-data cache is a 2-Kbyte, 2-way set associative cache; this means there are 32 sets with each set containing 2 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist 2 dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache the dirty bit associated with it is set.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.1.3 Write Buffer and Fill Buffer Overview The Intel® 80200 processor employs an eight entry write buffer, each entry containing 16 bytes. Stores to external memory are first placed in the write buffer and subsequently taken out when the bus is available. The write buffer supports the coalescing of multiple store requests to external memory. An incoming store may coalesce with any of the eight entries.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.2 Data Cache and Mini-Data Cache Operation The following discussions refer to the data cache and mini-data cache as one cache (data/mini-data) since their behavior is the same when accessed. 6.2.1 Operation When Caching is Enabled When the data/mini-data cache is enabled for an access, the data/mini-data cache compares the address of the request against the addresses of data that it is currently holding.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.2.3.2 Read Miss Policy The following sequence of events occurs when a cacheable (see Section 6.2.3.1, “Cacheability” on page 6-5) load operation misses the cache: 1. The fill buffer is checked to see if an outstanding fill request already exists for that line.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.2.3.3 Write Miss Policy A write operation that misses the cache requests a 32-byte cache line from external memory if the access is cacheable and write allocation is specified in the page. In this case the following sequence of events occur: 1. The fill buffer is checked to see if an outstanding fill request already exists for that line.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.2.4 Round-Robin Replacement Algorithm The line replacement algorithm for the data cache is round-robin. Each set in the data cache has a round-robin pointer that keeps track of the next line (in that set) to replace. The next line to replace in a set is the next sequential line after the last one that was just filled.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.3 Data Cache and Mini-Data Cache Control 6.3.1 Data Memory State After Reset After processor reset, both the data cache and mini-data cache are disabled, all valid bits are set to zero (invalid), and the round-robin bit points to way 31. Any lines in the data cache that were configured as data RAM before reset are changed back to cacheable lines after reset, i.e., there are 32 KBytes of data cache and zero bytes of data RAM.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.3.3.1 Global Clean and Invalidate Operation A simple software routine is used to globally clean the data cache. It takes advantage of the line-allocate data cache operation, which allocates a line into the data cache. This allocation will evict any dirty data in the cache back to external memory. Example 6-2 shows how the data cache can be cleaned. Example 6-2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache The line-allocate command will not operate on the mini Data Cache, so system software must clean this cache by reading 2KByte of contiguous unused data into it. This data must be unused and reserved for this purpose so that it will not already be in the cache. It must reside in a page that is marked as mini Data Cache cacheable (see Section 2.3.2).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.4 Re-configuring the Data Cache as Data RAM Software has the ability to lock tags associated with 32-byte lines in the data cache, thus creating the appearance of data RAM. Any subsequent access to this line always hits the cache unless it is invalidated. Once a line is locked into the data cache it is no longer available for cache allocation on a line fill.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache Example 6-3. Locking Data into the Data Cache ; ; ; ; configured with C=1 and B=1 R0 is the number of 32-byte lines to lock into the data cache. In this example 16 lines of data are locked into the cache. MMU and data cache are enabled prior to this code. .macroCPWAIT MRC P15, 0, R0, C2, C0, 0 MOV R0, R0 SUB PC, PC, #4 .endm .macroDRAIN MCR P15, 0, R0, C7, C10, 4 ; drain pending loads and stores .endm .
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache Example 6-4. Creating Data RAM ; ; ; ; ; ; ; R1 contains the virtual address of a region of memory to configure as data RAM, which is aligned on a 32-byte boundary. MMU is configured so that the memory region is cacheable. R0 is the number of 32-byte lines to designate as data RAM. In this example 16 lines of the data cache are re-configured as data RAM.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache Tags can be locked into the data cache by enabling the data cache lock mode bit located in coprocessor 15, register 9. (See Table 7-14, “Cache Lockdown Functions” on page 7-14 for the exact command.) Once enabled, any new lines allocated into the data cache are locked down. Note that the PLD instruction does not affect the cache contents if it encounters an error while executing.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Data Cache 6.5 Write Buffer/Fill Buffer Operation and Control See Section 1.2.2, “Terminology and Acronyms” on page 1-5 for a definition of coalescing. The write buffer is always enabled, which means, stores to external memory are buffered. The K bit in the Auxiliary Control Register (CP15, register 1) is a global enable/disable for allowing coalescing in the write buffer.
7 Configuration This chapter describes the System Control Coprocessor (CP15) and coprocessor 14 (CP14). CP15 configures the MMU, caches, buffers and other system attributes. Where possible, the definition of CP15 follows the definition in the first generation Intel® StrongARM* products. CP14 contains the performance monitor registers and the trace buffer registers. 7.1 Overview CP15 is accessed through MRC and MCR coprocessor instructions and allowed only in privileged mode.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration The format of MRC and MCR is shown in Table 7-1. cp_num is defined for CP15, CP14, CP13 and CP0. CP13 contains the interrupt controller and bus controller registers and is described in Chapter 9, “Interrupts”and Chapter 11, “Bus Controller,” respectively. CP0 supports instructions specific for DSP and is described in Chapter 2, “Programming Model.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration The format of LDC and STC is shown in Table 7-2. LDC and STC follow the programming notes in the ARM Architecture Reference Manual. LDC and STC transfer a single 32-bit word between a coprocessor register and memory. These instructions do not allow the programmer to specify values for opcode_1, opcode_2, or Rm; those fields implicitly contain zero. Table 7-2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2 CP15 Registers Table 7-3 lists the CP15 registers implemented in the Intel® 80200 processor. Table 7-3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.1 Register 0: ID and Cache Type Registers Register 0 houses two read-only registers that are used for part identification: an ID register and a cache type register. The ID Register is selected when opcode_2=0. This register returns the code for the Intel® 80200 processor: 0x69052000 for A0 stepping/revision. The low order four bits of the register are the chip revision number and will be incremented for future steppings.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration Table 7-5.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.2 Register 1: Control and Auxiliary Control Registers Register 1 is made up of two registers, one that is compliant with ARM Version 5 and is referenced by opcode_2 = 0x0, and the other which is specific to Intel® StrongARM* and is referenced by opcode_2 = 0x1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration The mini-data cache attribute bits, in the Intel® 80200 processor Control Register, are used to control the allocation policy for the mini-data cache and whether it uses write-back caching or write-through caching. The configuration of the mini-data cache should be setup before any data access is made that may be cached in the mini-data cache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.3 Register 2: Translation Table Base Register Table 7-8. Translation Table Base Register 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Translation Table Base reset value: unpredictable Bits Access Description 31:14 Read / Write Translation Table Base - Physical address of the base of the first-level table 13:0 Read-unpredictable / Write-as-Zero Reserved 7.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.6 Register 5: Fault Status Register The Fault Status Register (FSR) indicates which fault has occurred, which could be either a prefetch abort or a data abort. Bit 10 extends the encoding of the status field for prefetch aborts and data aborts. The definition of the extended status field is found in Section 2.3.4, “Event Architecture” on page 2-12.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.8 Register 7: Cache Functions All the functions defined in the first generation of Intel® StrongARM* appear here. The Intel® 80200 processor adds other functions as well. This register should be accessed as write-only. Reads from this register, as with an MRC, have an undefined effect. The Drain Write Buffer function not only drains the write buffer but also drains the fill buffer.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration Other items to note about the line-allocate command are: • It forces all pending memory operations to complete. • Bits [31:5] of Rd is used to specific the virtual address of the line to allocated into the data cache. • If the targeted cache line is already resident, this command has no effect. • This command cannot be used to allocate a line in the mini Data Cache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.9 Register 8: TLB Operations Disabling/enabling the MMU has no effect on the contents of either TLB: valid entries stay valid, locked items remain locked. All operations defined in Table 7-13 work regardless of whether the TLB is enabled or disabled. This register should be accessed as write-only. Reads from this register, as with an MRC, have an undefined effect. Table 7-13.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.10 Register 9: Cache Lock Down Register 9 is used for locking down entries into the instruction cache and data cache. (The protocol for locking down entries can be found in Chapter 6, “Data Cache”.) Table 7-14 shows the command for locking down entries in the instruction cache, instruction TLB, and data TLB. The entry to lock is specified by the virtual address in Rd.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.11 Register 10: TLB Lock Down Register 10 is used for locking down entries into the instruction TLB, and data TLB. (The protocol for locking down entries can be found in Chapter 3, “Memory Management”.) Lock/unlock operations on a TLB when the MMU is disabled have an undefined effect. This register should be accessed as write-only. Reads from this register, as with an MRC, have an undefined effect.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.13 Register 13: Process ID The Intel® 80200 processor supports the remapping of virtual addresses through a Process ID (PID) register. This remapping occurs before the instruction cache, instruction TLB, data cache and data TLB are accessed. The PID register controls when virtual addresses are remapped and to what value.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.14 Register 14: Breakpoint Registers The Intel® 80200 processor contains two instruction breakpoint address registers (IBCR0 and IBCR1), one data breakpoint address register (DBR0), one configurable data mask/address register (DBR1), and one data breakpoint control register (DBCON). The Intel® 80200 processor also supports a 256 entry, trace buffer that records program execution information.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.2.15 Register 15: Coprocessor Access Register This register is selected when opcode_2 = 0 and CRm = 1. This register controls access rights to all the coprocessors in the system except for CP15 and CP14. Both CP15 and CP14 can only be accessed in privilege mode. This register is accessed with an MCR or MRC with the CRm field set to 1. This register controls access to CP0 and CP13 for the Intel® 80200 processor.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration Table 7-20.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.3 CP14 Registers Table 7-21 lists the CP14 registers implemented in the Intel® 80200 processor. Table 7-21. CP14 Registers Register (CRn) 7.3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.3.3 Registers 6-7: Clock and Power Management These registers contain functions for managing the core clock and power. Three low power modes are supported that are entered upon executing the functions listed in Table 7-24. To enter any of these modes, write the appropriate data to CP14, register 7 (PWRMODE).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Configuration 7.3.4 Registers 8-15: Software Debug Software debug is supported by address breakpoint registers (Coprocessor 15, register 14), serial communication over the JTAG interface and a trace buffer. Registers 8 and 9 are used for the serial interface and registers 10 through 13 support a 256 entry trace buffer. Register 14 and 15 are the debug link register and debug SPSR (saved program status register).
8 System Management This chapter describes the clocking and power management features of the Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) along with reset details. Main features include a software controlled internal clock frequency and two low power modes: • idle • sleep 8.1 Clocking CLK is the input reference clock for the Intel® 80200 processor. CLK accepts an input clock frequency of 33 to 66 MHz.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture System Management Table 8-2. Software CCLK Configuration Example: CCLK Frequency (MHZ), assuming CLK Frequency of 66MHz CCLKCFG[3:0] (Coprocessor 14, register 6) Multiplier for CLK 0 (reserved) Unpredictable 1 3 200 2 4 266 3 5 333 4 6 400 5 7 466 6 8 533 7 9 600 8 10 666 9 11 733 10-15 (reserved) Unpredictable The Intel® 80200 processor supports low voltage operation with a supply as low as 0.95 V.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture System Management 8.2 Processor Reset The RESET# pin must be asserted when CLK and power are applied to the processor. CLK, MCLK, and power must be present and stable before RESET# can be deasserted. To ensure reset, RESET# must be asserted for at least 32 MCLK cycles once both clocks and the power are stable. Reset pulses shorter than this have an undefined effect.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture System Management 8.2.2 Reset Effect on Outputs After RESETOUT# is asserted, the processor’s output pins are driven to a well-defined state. Critical bus signals receive a ‘0’ or ‘1’ value, as shown in Figure 8-2. This figure also illustrates that HOLD is acknowledged during the reset sequence. Output pins only transition if a valid MCLK is present.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture System Management 8.3 Power Management The Intel® 80200 processor provides low power modes: idle and sleep, which are listed in increasing power saving order. Table 8-3 describes the attributes of each low power mode. Table 8-3. 8.3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture System Management The JTAG clock must be stopped during sleep mode. Drive a ‘0’ into the JTAG clock when not toggling it.
9 Interrupts 9.1 Introduction The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) supports a variety of external and internal interrupt sources. The Interrupt Control Unit (ICU) controls how the Intel® 80200 processor reacts to these interrupts. Ultimately, all interrupt sources are combined into one of two internal interrupts: IRQ and FIQ. These interrupts correspond to the IRQ and FIQ described in the ARM Architecture Reference Manual.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Interrupts 9.3 Programmer Model Software has access to three registers in the ICU. INTCTL is used to enable or disable (mask) individual interrupts. As mentioned, masking of all interrupts may still be accomplished via the CPSR register in the core. INTSRC is a read-only register that records all currently active interrupt sources. Even if an interrupt is masked, software may use INTSRC to test for its source.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Interrupts 9.3.1 INTCTL INTCTL is used to specify what interrupts are disabled (masked). Table 9-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Interrupts 9.3.2 INTSRC The Interrupt Source register (INTSRC) indicates which interrupts are active. This register may be used by an ISR to determine quickly the source of an interrupt. Even if an interrupt is masked with INTCTL, software may still detect whether it is asserted by reading its bit from INTSRC. Table 9-2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Interrupts 9.3.3 INTSTR Systems may have differing priorities for the various interrupt cases; the ICU allows system designers to associate each internal interrupt source with one of the two internal interrupts: FIQ and IRQ. This association is called steering. INTSTR is used to specify how internal interrupt sources should be steered Table 9-3.
10 External Bus 10.1 General Description The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) bus is a split bus, with separate request and data buses. It is designed primarily as the memory and I/O bus for the Intel® 80200 processor, not as a general purpose multi-master bus, although it is possible to have several masters on it efficiently.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus An alternate configuration with a separate memory bus is also possible, shown in Figure 10-2. All signals on this bus, data and request, are sampled on the rising edge of MCLK. MCLK is created by the system and is an input to the Intel® 80200 processor. MCLK is asynchronous with respect to the Intel® 80200 processor core frequency and any other Intel® 80200 processor input clocks.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2 Signal Description Table 10-1. Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Signals Signal Width I/O Function MCLK 1 I bus clock (note: all bus activity is triggered by the rising edge of this clock). O During the first cycle of the issue phase: this signal indicates the start of a bus request.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2.1 Request Bus The request bus issues read or write requests from the Intel® 80200 processor or other bus master to the chipset or memory controller. Each request takes two MCLK cycles. All signals should be sampled on the rising edge of MCLK. No data is ever transferred on the request bus. On the first cycle, ADS#/LEN[2], Lock/LEN[1], and W/R#/LEN[0] are used to carry the ADS#, Lock, and W/R# signals.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus Table 10-3. Requests on a 32-bit Bus LEN # Data Bytes 000 001 010 011 100 101 110 111 1 2 4 8 12 16 32 1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2.2 Data Bus Some time after a request is made on the request bus, data must be transferred for that request on the data bus. Each request has a corresponding transaction (one or more cycles) on the data bus. Data bus transactions must occur in the same order as the requests were made. The delay between a request going out and the data coming back to or being driven from the bus master is arbitrary.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2.3 Critical Word First The CWF signal is only used during read bursts of eight words (Len = 6). CWF needs to be driven at the same time as DValid of the first data cycle of the transaction. This bit indicates to the requesting master what order the data is returning in. The Intel® 80200 processor uses this sort of transaction to fill a cache line.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus There are eight byte enables (BE#) associated with the D bus. Each byte enable corresponds to one byte of the bus. During a write cycle, the byte enables for each byte that is being written is asserted low. More detail on write transactions are given below. Eight check bits, DCB, are also provided as part of the data bus. These bits are used for ECC. Section 10.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2.5 Multimaster Support Simple multimaster support is supplied with the Hold pin. The Hold pin causes the Intel® 80200 processor to stop issuing new requests as soon as possible (see below for timing) and to float the following pins: A, ADS#/LEN[2], W/R#/LEN[0], and Lock/LEN[1]. Before floating ADS#, the Intel® 80200 processor drives it to an inactive state (high).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus A simpler but lower performance method would be to assert Hold to the Intel® 80200 processor, wait for all outstanding transactions to complete, grant the issue bus to the alternate master (using the issue bus pins with the Intel® 80200 processor bus protocols, or whatever protocol the alternate master required) and give the bus back to the Intel® 80200 processor only once the alternate bus master is completely finished.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2.6 Abort If for any reason a request made by the Intel® 80200 processor can not be completed, it must be aborted. At the same time as the assertion DValid for any data cycle of any transaction, Abort can be asserted. This has the effect of ending that transaction at that data cycle. The Intel® 80200 processor saves the address of the aborted transaction and take an exception.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2.7 ECC Software running on the Intel® 80200 processor may configure pages in memory as being ECC protected. For such pages, the Intel® 80200 processor checks the ECC code associated with read data, and generates an ECC code to associate with write data. The ECC code for a data element is transported over the DCB bus. For 64-bit wide memories, an additional eight bits of width are required to hold the ECC code.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.2.8 Big Endian System Configuration The Intel® 80200 processor supports execution in a big endian system. A system is said to be big endian if multi-byte values are accessed with the MSB at lower addresses. The endian orientation of a system is only evident when software performs sub-word sized accesses.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3 Examples All examples assume a 64-bit bus, in a little endian system. 10.3.1 Simple Read Word In Figure 10-4, a read request for one word at address 0x240 is issued at time 10 ns. ADS# is asserted low at that clock edge, 0x240 is driven on A, W/R# is driven low to indicate a read request, and 0x2 is driven onto the Len bus to indicate that the access if for four bytes.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.2 Read Burst, No Critical Word First In Figure 10-5 the request goes out the same as the last example, with the address 0x248 this time and the length 0x6, indicating an eight word cache line fill. The first data cycle begins at 50 ns with DValid being asserted with CWF low to indicate that this burst starts at the lowest word pair and return sequentially.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.3 Read Burst, Critical Word First Data Return Figure 10-6 is the same as the last with one difference: CWF is asserted high on the first data cycle of the return data. This indicates that the data is returning critical word first. In this case, since the address requested was 0x248, the word pair containing that byte starting at 0x248 is returned first.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.4 Word Write Figure 10-7 shows a 32-bit write request to address 0x240. W/R# is high when ADS# is asserted low. Two cycles before the write data needs to be on the bus for the SDRAM, DValid is asserted by the chipset to the Intel® 80200 processor to tell the Intel® 80200 processor the data is needed.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.5 Two Word Coalesced Write In Figure 10-8, two store byte instructions from the instruction stream have been coalesced into a single write command in the write buffer. The bytes were stored to addresses 0x240 and 0x247. The request is the same as the basic write word case except now the length is 0x3, indicating a two word write.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.5.1 Write Burst Figure 10-9 shows a four word write caused by the eviction of a half cache line. In this case, the Len is 0x5 indicating four words. DValid is asserted for two consecutive cycles here, but the two cycles could be spread out. In this case the Intel® 80200 processor drives the data as requested, along with BE# of 0x00 each cycle, indicating that all the bytes are being written. Figure 10-9.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.6 Write Burst, Coalesced Figure 10-10 shows a four word cache write caused by store requests coalesced in a write buffer. The Len is 0x5 indicating four words. DValid is asserted for two consecutive cycles. The Intel® 80200 processor drives the data as requested, but this time the byte enables are not all zeroes. The byte enables here are asserted low only for those bytes that were stored by the instruction stream.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.7 Pipelined Accesses The example in Figure 10-11 demonstrates the four deep pipelined nature of this bus. In this example, the Intel® 80200 processor is bus limited and is issuing requests as quickly as it can. Before time 0ns, there are no outstanding transactions. Two reads (A and B) followed by a write (C) and another read (D) are all requested before 85 ns in this timing diagram.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.8 Locked Access An example of a locked access is shown in Figure 10-12. Here the processor is doing an atomic read/write to address 0x240, denoted as A in the figure. The Lock signal, which is valid at the positive edge of MCLK when ADS# is asserted, are asserted for each request from the read of A just prior to the matching write of A.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.9 Aborted Access As discussed in Section 10.2.6, “Abort” on page 10-11, any request from the Intel® 80200 processor can be aborted by the chipset or memory. This might occur if there was a PCI error, or if a request was issued to unimplemented memory. Figure 10-13 shows an aborted read. Read A is issued at time 10 ns for 32 bytes of data. At 50 ns DValid goes high to indicate the beginning of the first data cycle.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture External Bus 10.3.10 Hold Figure 10-14 shows an example of hold being asserted to stop new transactions being issued. The Intel® 80200 processor floats the issue bus pins and issues no transactions until HldA is deasserted. The Hold signal assertion does not affect the data bus, which continues to operate normally. Read data for requests A and B continue to return.
11 Bus Controller 11.1 Introduction The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) Bus Controller Unit (BCU) is responsible for accessing off-chip memory. It initiates bus cycles as documented in Chapter 10, “External Bus”. The BCU is capable of queuing four outstanding transactions. This improves the performance of the processor, because it does not need to wait for the result of a memory transaction before initiating another.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller 11.3 Error Handling The BCU is able to detect and respond to two classes of errors: bus aborts and ECC errors. Information about errors is captured in a set of programmer-accessible registers: ELOG0, ELOG1, and ECAR0, ECAR1. The ELOGx registers log general information about an error, while the ECARx registers capture the address associated with an error. 11.3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller 11.3.2 ECC Errors An ECC error occurs when the BCU reads data and notices that the associated ECC bits do not match the data. This could also happen as a result of the RMW that the BCU performs on sub bus-width writes. A single transaction on the bus could result in multiple ECC errors, as the BCU checks each bus-width entity as it is received. Table 11-1 summarizes the BCU error response for ECC errors.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller Error reporting may be enabled with the BCUCTL register, described in Section 11.4.1. If enabled, single bit errors cause the BCU to assert an interrupt to the Interrupt Controller Unit (ICU). If the interrupt is not enabled in the ICU, it is not propagated to the core. This interrupt may be cleared by software by writing to the BCU Control Register (see Section 11.4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller 11.4 Programmer Model The BCU registers reside in Coprocessor 13 (CP13). They may be accessed/manipulated with the MCR, MRC, STC, and LDC instructions. The CRn field of the instruction denotes the register number to be accessed. Field CRm must be set to 1. The opcode_1, and opcode_2 fields of the instruction should be zero. Access to CP13 may be controlled using the Coprocessor Access Register (see Section 7.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller Table 11-2. BCUCTL (Register 0) (Sheet 2 of 2) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 T E E E P V 1 0 3 2 1 E S E C 0 S R reset value: all implemented bits are 0 Bits Access Description EE - ECC Enable 3 Read / Write 2 Read / Write 0 = disable single bit error correction 1 = enable single bit error correction 1 Read-unpredictable/ Write-as-1 Reserved.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller When ECC is enabled, the BCU only generates an interrupt on a single-bit error if BCUCTL.SR is set. When ECC is enabled, the BCU always generates an abort on a multi-bit error. The BCU repairs single bit errors if BCUCTL.SC is set. It is recommended that this bit always be set; running with this bit cleared could cause software to operate on corrupted data before the ECC-error detect interrupt is received. If BCUCTL.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller BCUMOD.AF affects the behavior of the BCU when it is reading a 32-byte block (a cache line-fill). If this bit is ‘0’, then the BCU always emits the 32-byte aligned address of the cache line when requesting it. If this bit is ‘1’, then the BCU emits the address of the “critical word” in the cache line when requesting it. This latter setting allows external logic to implement CWF logic (as detailed in Section 10.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller 11.4.2 ECC Error Registers Table 11-4. ELOG0, ELOG1(Registers 4, 5) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 R W 8 7 6 5 ET 4 3 2 1 0 syn reset value: undefined Bits Access Description RW - indicates the direction of the errant transfer 31 Table 11-5.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Bus Controller The BCU does not write to these ELOGx/ECARx registers unless the corresponding BCUCTL.Ex bit is cleared, either by reset or by software. Table 11-6.
Performance Monitoring 12 This chapter describes the performance monitoring facility of the Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE). The events that are monitored can provide performance information for compiler writers, system application developers and software programmers. 12.1 Overview The Intel® 80200 processor hardware provides two 32-bit performance counters that allow two unique events to be monitored simultaneously.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.2 Clock Counter (CCNT; CP14 - Register 1) The format of CCNT is shown in Table 12-1. The clock counter is reset to ‘0’ by Performance Monitor Control Register (PMNC) or can be set to a predetermined value by directly writing to it. It counts core clock cycles. When CCNT reaches its maximum value 0xFFFF,FFFF, the next clock cycle causes it to roll over to zero and set the overflow flag (bit 6) in PMNC.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.3 Performance Count Registers (PMN0 - PMN1; CP14 Register 2 and 3, Respectively) There are two 32-bit event counters; their format is shown in Table 12-2. The event counters are reset to ‘0’ by the PMNC register or can be set to a predetermined value by directly writing to them.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring Table 12-3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.5 Performance Monitoring Events Table 12-4 lists events that may be monitored by the PMU. Each of the Performance Monitor Count Registers (PMN0 and PMN1) can count any listed event. Software selects which event is counted by each PMNx register by programming the evtCountx fields of the PMNC register. Table 12-4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring Some typical combination of counted events are listed in this section and summarized in Table 12-5. In this section, we call such an event combination a mode. Table 12-5. Some Common Uses of the PMU Mode 12.5.1 PMNC.evtCount0 PMNC.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.5.2 Data Cache Efficiency Mode PMN0 totals the number of data cache accesses, which includes cacheable and non-cacheable accesses, mini-data cache access and accesses made to locations configured as data RAM. Note that STM and LDM each count as several accesses to the data cache depending on the number of registers specified in the register list. LDRD registers two accesses.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.5.4 Data/Bus Request Buffer Full Mode The Data Cache has buffers available to service cache misses or uncacheable accesses. For every memory request that the Data Cache receives from the processor core, a buffer is speculatively allocated in case an external memory request is required or temporary storage is needed for an unaligned access. If no buffers are available, the Data Cache will stall the processor core.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring PMN1 counts the number of writeback operations emitted by the data cache. These writebacks occur when the data cache evicts a dirty line of data to make room for a newly requested line or as the result of clean operation (CP15, register 7). Statistics derived from these two events: • The percentage of total execution cycles the processor stalled because of a data dependency.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.6 Multiple Performance Monitoring Run Statistics Even though only two events can be monitored at any given time, multiple performance monitoring runs can be done, capturing different events from different modes.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Monitoring 12.7 Examples In this example, the events selected with the Instruction Cache Efficiency mode are monitored and CCNT is used to measure total execution time. Sampling time ends when PMN0 overflows which generates an IRQ interrupt. Example 12-1.
13 Software Debug This chapter describes software debug and related features in the Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with ARM* Architecture V5TE), namely: • • • • • 13.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.3 Introduction The Intel® 80200 processor debug unit, when used with a debugger application, allows software running on a the Intel® 80200 processor target to be debugged. The debug unit allows the debugger to stop program execution and re-direct execution to a debug handling routine. Once program execution has stopped, the debugger can examine or modify processor state, co-processor state, or memory.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.4 Debug Control and Status Register (DCSR) The DCSR register is the main control register for the debug unit. Table 13-1 shows the format of the register. The DCSR register can be accessed in privileged modes by software running on the core or by a debugger through the JTAG interface. Refer to Section 13.11.2, SELDCSR JTAG Register for details about accessing DCSR through JTAG. Table 13-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug Table 13-1. Debug Control and Status Register (DCSR) (Sheet 2 of 2) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 GE H Bits 15:6 5 TF TI 9 8 TD TA TS TU TR Access Description Reserved SW Read / Write JTAG Read-Only Sticky Abort (SA) Method Of Entry (MOE) 13.4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.4.3 Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR) The Vector Trap bits allow instruction breakpoints to be set on exception vectors without using up any of the breakpoint registers. When a bit is set, it acts as if an instruction breakpoint was set up on the corresponding exception vector. A debug exception is generated before the instruction in the exception vector executes.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.5 Debug Exceptions A debug exception causes the processor to re-direct execution to a debug event handling routine.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug During Halt mode, software running on the Intel® 80200 processor cannot access DCSR, or any of hardware breakpoint registers, unless the processor is in Special Debug State (SDS), described below. When a debug exception occurs during Halt mode, the processor takes the following actions: • • • • disables the trace buffer • • • • • • SPSR_dbg = CPSR sets DCSR.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.5.2 Monitor Mode In monitor mode, the processor handles debug exceptions like normal ARM exceptions. If debug functionality is enabled (DCSR[31] = 1) and the processor is in Monitor mode, debug exceptions cause either a data abort or a pre-fetch abort.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.6 HW Breakpoint Resources The Intel® 80200 processor debug architecture defines two instruction and two data breakpoint registers, denoted IBCR0, IBCR1, DBR0, and DBR1. The instruction and data address breakpoint registers are 32-bit registers. The instruction breakpoint causes a break before execution of the target instruction. The data breakpoint causes a break after the memory access has been issued.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.6.2 Data Breakpoints The Intel® 80200 processor debug architecture defines two data breakpoint registers (DBR0, DBR1). The format of the registers is shown in Table 13-4. Table 13-4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug When DBR1 is programmed as a data address mask, it is used in conjunction with the address in DBR0. The bits set in DBR1 are ignored by the processor when comparing the address of a memory access with the address in DBR0. Using DBR1 as a data address mask allows a range of addresses to generate a data breakpoint. When DBR1 is selected as a data address mask, it is unaffected by the E1 field of DBCON.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.8 Transmit/Receive Control Register (TXRXCTRL) Communications between the debug handler and debugger are controlled through handshaking bits that ensures the debugger and debug handler make synchronized accesses to TX and RX. The debugger side of the handshaking is accessed through the DBGTX (Section 13.11.4, DBGTX JTAG Register) and DBGRX (Section 13.11.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.8.1 RX Register Ready Bit (RR) The debugger and debug handler use the RR bit to synchronize accesses to RX. Normally, the debugger and debug handler use a handshaking scheme that requires both sides to poll the RR bit.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.8.2 Overflow Flag (OV) The Overflow flag is a sticky flag that is set when the debugger writes to the RX register while the RR bit is set. The flag is used during high-speed download to indicate that some data was lost. The assumption during high-speed download is that the time it takes for the debugger to shift in the next data word is greater than the time necessary for the debug handler to process the previous data word.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.8.4 TX Register Ready Bit (TR) The debugger and debug handler use the TR bit to synchronize accesses to the TX register. The debugger and debug handler must poll the TR bit before accessing the TX register. Table 13-9 shows the handshaking used to access the TX register. Table 13-9. TX Handshaking Debugger Actions Debugger is expecting data from the debug handler.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.9 Transmit Register (TX) The TX register is the debug handler transmit buffer. The debug handler sends data to the debugger through this register. Table 13-11.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11 Debug JTAG Access There are four JTAG instructions used by the debugger during software debug: LDIC, SELDCSR, DBGTX and DBGRX. LDIC is described in Section 13.14, Downloading Code in the ICache. The other three JTAG instructions are described in this section. SELDCSR, DBGTX and DBGRX use a common 36-bit shift register (DBG_SR). New data is shifted in and captured data out through the DBG_SR.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11.2 SELDCSR JTAG Register Placing the “SELDCSR” JTAG instruction in the JTAG IR, selects the DCSR JTAG Data register (Figure 13-1), allowing the debugger to access the DCSR, generate an external debug break, set the hold_rst signal, which is used when loading code into the instruction cache during reset. Figure 13-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug Figure 13-2. SELDCSR Data Register DCSR 0 0 1 0 2 1 0 Capture_DR DBG_SR TDI 35 34 3 ignored TDO Update_DR DBG_REG 2 34 33 TCK 1 0 DBG.HLD_RST DBG.BRK DBG.DCSR 13.11.2.1 DBG.HLD_RST The debugger uses DBG.HLD_RST when loading code into the instruction cache during a processor reset. Details about loading code into the instruction cache are in Section 13.14, Downloading Code in the ICache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11.2.2 DBG.BRK DBG.BRK allows the debugger to generate an external debug break and asynchronously re-direct execution to a debug handling routine. A debugger sets an external debug break by scanning data into the DBG_SR with DBG_SR[2] set and the desired value to set the DCSR JTAG writable bits in DBG_SR[34:3]. Once an external debug break is set, it remains set internally until a debug exception occurs.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11.4 DBGTX JTAG Register The DBGTX JTAG instruction selects the Debug JTAG Data register (Figure 13-3). The debugger uses the DBGTX data register to poll for breaks (internal and external) to debug mode and once in debug mode, to read data from the debug handler. Figure 13-3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11.6 DBGRX JTAG Register The DBGRX JTAG instruction selects the DBGRX JTAG Data register. The debugger uses the DBGRX data register to send data or commands to the debug handler. Figure 13-4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11.6.1 RX Write Logic The RX write logic (Figure 13-6) serves 4 functions: 1) Enable the debugger write to RX - the logic ensures only new, valid data from the debugger is written to RX. In particular, when the debugger polls TXRXCTRL[31] to see whether the debug handler has read the previous data from RX. The JTAG state machine must go through Update_DR, which should not modify RX.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11.6.2 DBGRX Data Register The bits in the DBGRX data register (Figure 13-6) are used by the debugger to send data to the processor. The data register also contains a bit to flush previously written data and a high-speed download flag. Figure 13-6. DBGRX Data Register RX TXRXCTRL[31] 0 0 1 2 1 Capture_DR DBG_SR TDI 35 34 3 TDO 0 DBG.RR cleared by RX Write Logic Update_DR DBG_REG 2 34 33 TCK 1 0 DBG.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.11.6.4 DBG.V The debugger sets this bit to indicate the data scanned into DBG_SR[34:3] is valid data to write to RX. DBG.V is an input to the RX Write Logic and is also cleared by the RX Write Logic. When this bit is set, the data scanned into the DBG_SR is written to RX following an Update_DR. If DBG.V is not set and the debugger does an Update_DR, RX is unchanged. This bit does not affect the actions of DBG.FLUSH or DBG.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.12 Trace Buffer The 256 entry trace buffer provides the ability to capture control flow information to be used for debugging an application. Two modes are supported: 1. The buffer fills up completely and generates a debug exception. Then SW empties the buffer. 2. The buffer fills up and wraps around until it is disabled. Then SW empties the buffer. 13.12.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug When the trace buffer is enabled, reading and writing to either checkpoint register has unpredictable results. When the trace buffer is disabled, writing to a checkpoint register sets the register to the value written. Reading the checkpoint registers returns the value of the register. In normal usage, the checkpoint registers are used to hold target addresses of specific entries in the trace buffer.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.13 Trace Buffer Entries Trace buffer entries consist of either one or five bytes. Most entries are one byte messages indicating the type of control flow change. The target address of the control flow change represented by the message byte is either encoded in the message byte (like for exceptions) or can be determined by looking at the instruction word (like for direct branches).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.13.1.1 Exception Message Byte When any kind of exception occurs, an exception message is placed in the trace buffer. In an exception message byte, the message type bit (M) is always 0. The vector exception (VVV) field is used to specify bits[4:2] of the vector address (offset from the base of default or relocated vector table). The vector allows the host SW to identify which exception occurred.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.13.1.2 Non-exception Message Byte Non-exception message bytes are used for direct branches, indirect branches, and rollovers. In a non-exception message byte, the 4-bit message type field (MMMM) specifies the type of message (refer to Table 13-17). The incremental word count (CCCC) is the instruction count since the last control flow change (excluding the current branch).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.13.1.3 Address Bytes Only indirect branch entries contain address bytes in addition to the message byte. Indirect branch entries always have four address bytes indicating the target of that indirect branch. When reading the trace buffer the MSB of the target address is read out first; the LSB is the fourth byte read out; and the indirect branch message byte is the fifth byte read out.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.13.2 Trace Buffer Usage The Intel® 80200 processor trace buffer is 256 bytes in length. The first byte read from the buffer represents the oldest trace history information in the buffer. The last (256th) byte read represents the most recent entry in the buffer. The last byte read from the buffer is always a message byte. This provides the debugger with a starting point for parsing the entries out of the buffer.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug As the trace buffer is read, the oldest entries are read first. Reading a series of 5 (or more) consecutive “0b0000 0000” entries in the oldest entries indicates that the trace buffer has not wrapped around and the first valid entry is the first non-zero entry read out. Reading 4 or less consecutive “0b0000 0000” entries requires a bit more intelligence in the host SW.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14 Downloading Code in the ICache On the Intel® 80200 processor, a 2K mini instruction cache, physically separate1 from the 32K main instruction cache can be used as an on-chip instruction RAM. An external host can download code directly into either instruction cache through JTAG. In addition to downloading code, several cache functions are supported.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.2 LDIC JTAG Data Register The LDIC JTAG Data Register is selected when the LDIC JTAG instruction is in the JTAG IR. An external host can load and invalidate lines in the instruction cache through this data register. Figure 13-10.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.3 LDIC Cache Functions The Intel® 80200 processor supports four cache functions that can be executed through JTAG. Two functions allow an external host to download code into the main instruction cache or the mini instruction cache through JTAG. Two additional functions are supported to allow lines to be invalidated in the instruction cache. The following table shows the cache functions supported through JTAG.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug Figure 13-11. Format of LDIC Cache Functions VA[31:5] Invalidate IC Line 0 0 0 0 0 0 32 31 Invalidate Mini IC x x 5 0 ... x 0 0 0 0 0 1 32 31 P 2 5 2 0 Data Word 7 . . . Load Main IC (CMD = 0b010) - indicates first bit shifted in - indicates last bit shifted in Data Word 0 P and Load Mini IC (CMD = 0b011) VA[31:5] 32 31 0 0 0 5 CMD 2 0 All packets are 33 bits in length.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.4 Loading IC During Reset Code can be downloaded into the instruction cache through JTAG during a processor reset. This feature is used during software debug to download the debug handler prior to starting an application program.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.4.1 Loading IC During Cold Reset for Debug The Figure 13-12 shows the actions necessary to download code into the instruction cache during a cold reset for debug. NOTE: In the Figure 13-12 hold_rst is a signal that gets set and cleared through JTAG When the JTAG IR contains the SELDCSR instruction, the hold_rst signal is set to the value scanned into DBG_SR[1]. Figure 13-12.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug An external host should take the following steps to load code into the instruction cache following a cold reset: • Assert the RESET# and TRST# pins: This resets the JTAG IR to IDCODE and invalidates the instruction cache (main and mini). • Load the SELDCSR JTAG instruction into JTAG IR and scan in a value to set the Halt Mode bit in DCSR and to set the hold_rst signal. For details of the SELDCSR, refer to Section 13.11.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.4.2 Loading IC During a Warm Reset for Debug Loading the instruction cache during a warm reset may be a slightly different situation than during a cold reset. For a warm reset, the main issue is whether the instruction cache gets invalidated by the processor reset or not. There are several possible scenarios: • While reset is asserted, TRST# is also asserted.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug If it is necessary to download code into the instruction cache then: 2) Assert TRST#. This clears the Halt Mode bit allowing the instruction cache to be invalidated. 3) Clear the Halt Mode bit through JTAG. This allows the instruction cache to be invalidated by reset. 4) Place the LDIC JTAG instruction in the JTAG IR, then proceed with the normal code download, using the Invalidate IC Line function before loading each line.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.5 Dynamically Loading IC After Reset An external host can load code into the instruction cache “on the fly” or “dynamically”. This occurs when the host downloads code while the processor is not being reset. However, this requires strict synchronization between the code running on the Intel® 80200 processor and the external host.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug The following steps describe the details for downloading code: • Since the debug handler is responsible for synchronization during the code download, the handler must be executing before the host can begin the download. The debug handler execution starts when the application running on the Intel® 80200 processor generates a debug exception or when the host generates an external debug break.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.5.1 Dynamic Code Download Synchronization The following pieces of code are necessary in the debug handler to implement the synchronization used during dynamic code download. The pieces must be ordered in the handler as shown below. # # # # # # # # # Before the download can start, all outstanding instruction fetches must complete. The MCR invalidate IC by line function serves as a barrier instruction in the 80200.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.14.6 Mini Instruction Cache Overview The mini instruction cache is a smaller version of the main instruction cache (Refer to Chapter 4 for more details on the main instruction cache). It is a 2KB, 2-way set associative cache. There are 32 sets, each containing two ways; each way contains 8 words. The cache uses the round-robin replacement policy.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.15 Halt Mode Software Protocol This section describes the overall debug process in Halt Mode. It describes how to start and end a debug session and details for implementing a debug handler. Intel provides a standard Debug Handler that implements some of the techniques in this chapter. The Intel Debug Handler itself is a a document describing additional handler implementation techniques and requirements. 13.15.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.15.1.2 Placing the Handler in Memory The debug handler is not required to be placed at a specific pre-defined address. However, there are some limitations on where the handler can be placed due to the override vector tables and the 2-way set associative mini instruction cache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.15.2 Implementing a Debug Handler The debugger uses the debug handler to examine or modify processor state by sending commands and reading data through JTAG. The API between the debugger and debug handler is specific to a debugger implementation. Intel provides a standard debug handler and API which can be used by third-party vendors.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.15.2.3 Dynamic Debug Handler On the Intel® 80200 processor, the debug handler and override vector tables reside in the 2 KB mini instruction cache, separate from the main instruction cache. A “static” Debug Handler is downloaded during reset. This is the base handler code, necessary to do common operations such as handler entry/exit, parse commands from the debugger, read/write ARM registers, read/write memory, etc.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 2. Using the Main IC The steps for downloading dynamic functions into the main instruction cache is similar to downloading into the mini instruction cache. However, using the main instruction cache has its advantages. Using the main instruction cache eliminates the problem of inadvertently overwriting static Debug Handler code by writing to the wrong way of a set, since the main and mini instruction caches are separate.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.15.2.4 High-Speed Download Special debug hardware has been added to support a high-speed download mode to increase the performance of downloads to system memory (vs. writing a block of memory using the standard handshaking). The basic assumption is that the debug handler can read any data sent by the debugger and write it to memory, before the debugger can send the next data.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.15.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Software Debug 13.16 Software Debug Notes/Errata 1. Trace buffer message count value on data aborts: LDR to non-PC that aborts gets counted in the exception message. But an LDR to the PC that aborts does not get counted on exception message. 2. SW Note on data abort generation in Special Debug State. 1) Avoid code that could generate precise data aborts.
Performance Considerations 14 This chapter describes relevant performance considerations that compiler writers, application programmers and system designers need to be aware of to efficiently use Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE). Performance numbers discussed here include interrupt latency, branch prediction, and instruction latencies. 14.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations 14.2 Branch Prediction The Intel® 80200 processor implements dynamic branch prediction for the ARM* instructions B and BL and for the Thumb* instruction B. Any instruction that specifies the PC as the destination is predicted as not taken. For example, an LDR or a MOV that loads or moves directly to the PC is predicted not taken and incur a branch latency penalty.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations 14.4 Instruction Latencies The latencies for all the instructions are shown in the following sections with respect to their functional groups: branch, data processing, multiply, status register access, load/store, semaphore, and coprocessor. The following section explains how to read these tables. 14.4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations • Minimum Resource Latency The minimum cycle distance from the issue clock of the current multiply instruction to the issue clock of the next multiply instruction assuming the second multiply does not incur a data dependency and is immediately available from the instruction cache or memory interface. For the following code fragment, here is an example of computing latencies: Example 14-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations Table 14-5. Branch Instruction Timings (Those not predicted by the BTB) Mnemonic Minimum Issue Latency when Branch Not Taken Minimum Issue Latency when Branch Taken BLX(1) N/A 5 BLX(2) 1 5 BX 1 5 Data Processing Instruction with PC as the destination Same as Table 14-6 4 + numbers in Table 14-6 LDR PC, <> 2 8 LDM with PC in register list 3 + numreg1 10 + max (0, numreg-3) 1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations 14.4.4 Multiply Instruction Timings Table 14-7.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations Table 14-7. Multiply Instruction Timings (Sheet 2 of 2) Mnemonic Rs Value (Early Termination) Rs[31:15] = 0x00000 UMULL Rs[31:27] = 0x00 all others 1. Table 14-8.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations 14.4.5 Saturated Arithmetic Instructions h Table 14-10. Saturated Data Processing Instruction Timings Mnemonic 14.4.6 Minimum Issue Latency Minimum Result Latency QADD 1 2 QSUB 1 2 QDADD 1 2 QDSUB 1 2 Status Register Access Instructions Table 14-11. Status Register Access Instruction Timings Mnemonic 14.4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Performance Considerations 14.4.8 Semaphore Instructions Table 14-14. Semaphore Instruction Timings 14.4.9 Mnemonic Minimum Issue Latency Minimum Result Latency SWP 5 5 SWPB 5 5 Coprocessor Instructions Table 14-15. CP15 Register Access Instruction Timings Mnemonic Minimum Issue Latency Minimum Result Latency MRC 4 4 MCR 2 N/A Table 14-16. CP14 Register Access Instruction Timings 14.4.
Compatibility: Intel® 80200 Processor vs. SA-110 A This appendix highlights the differences between the first generation Intel® StrongARM* technology (SA-110) and the Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE). A.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Compatibility: Intel® 80200 Processor vs. SA-110 Feature / Parameter Main Execution Pipeline • RISC Superpipeline Brief Description or Note Scalar, in-order execution, single issue Pipeline with more than usual number of pipe stages. Allows greater operating frequency. • Out-of order completion Instructions may finish out of program order.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Compatibility: Intel® 80200 Processor vs. SA-110 A.3 Architecture Deviations A.3.1 Read Buffer A Read Buffer is not supported on the Intel® 80200 processor and the definition of CP15 register 9 has changed from controlling the read buffer (on SA-110) to one that controls cache/TLB lock down (on the Intel® 80200 processor).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Compatibility: Intel® 80200 Processor vs. SA-110 A.3.4 Write Buffer Behavior Definition of Coalescing: Coalescing means bringing together a new store operation with an existing store operation already resident in the write buffer. The new store is placed in the same write buffer entry as an existing store when the address of the new store falls in the 4-word aligned address of the existing entry.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Compatibility: Intel® 80200 Processor vs. SA-110 A.3.6 Performance Differences There exists significant performance differences in program execution between SA-110 and the Intel® 80200 processor. If an SA-110 application had operations that had specific timing relationships, these relationships would not hold for the Intel® 80200 processor. In all typical applications, the Intel® 80200 processor performance greatly exceeds that of SA-110.
Optimization Guide B.1 B Introduction This appendix contains optimization techniques for achieving the highest performance from the Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE). It is written for developers who are optimizing compilers or performance analysis tools for the Intel® 80200 processor based processors. It can also be used by application developers to obtain the best performance from their assembly language code.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.2 Intel® 80200 Processor Pipeline One of the biggest differences between the Intel® 80200 processor and first-generation Intel® StrongARM* processors is the pipeline. Many of the differences are summarized in Figure B-1. This section provides a brief description of the structure and behavior of the Intel® 80200 processor pipeline. B.2.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.2.1.2. Intel® 80200 Processor Pipeline Organization The Intel® 80200 processor single-issue superpipeline consists of a main execution pipeline, MAC pipeline, and a memory access pipeline. These are shown in Figure B-1, with the main execution pipeline shaded. Figure B-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.2.1.3. Out Of Order Completion Sequential consistency of instruction execution relates to two aspects: first, to the order in which the instructions are completed; and second, to the order in which memory is accessed due to load and store instructions. The Intel® 80200 processor preserves a weak processor consistency because instructions may complete out of order, provided that no data dependencies exist.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.2.2 Instruction Flow Through the Pipeline The Intel® 80200 processor pipeline issues a single instruction per clock cycle. Instruction execution begins at the F1 pipestage and completes at the WB pipestage. Although a single instruction may be issued per clock cycle, all three pipelines (MAC, memory, and main execution) may be processing instructions simultaneously.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.2.3 Main Execution Pipeline B.2.3.1. F1 / F2 (Instruction Fetch) Pipestages The job of the instruction fetch stages F1 and F2 is to present the next instruction to be executed to the ID stage.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.2.3.3. RF (Register File / Shifter) Pipestage The main function of the RF pipestage is to read and write to the register file unit (RFU). It provides source data to: • • • • EX for ALU operations MAC for multiply operations Data Cache for memory writes Coprocessor interface The ID unit decodes the instruction and specifies which registers are accessed in the RFU.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.2.4 Memory Pipeline The memory pipeline consists of two stages, D1 and D2. The data cache unit, or DCU, consists of the data-cache array, mini-data cache, fill buffers, and writebuffers. The memory pipeline handles load / store instructions. B.2.4.1. D1 and D2 Pipestage Operation begins in D1 after the X1 pipestage has calculated the effective address for load/stores.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.3 Basic Optimizations This chapter outlines optimizations specific to ARM architecture. These optimizations have been modified to suit the Intel® 80200 processor architecture where needed. B.3.1 Conditional Instructions The Intel® 80200 processor architecture provides the ability to execute instructions conditionally.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.3.1.2. Optimizing Branches Branches decrease application performance by indirectly causing pipeline stalls. Branch prediction improves the performance by lessening the delay inherent in fetching a new instruction stream. The number of branches that can accurately be predicted is limited by the size of the branch target buffer.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide P2 Percentage of times we are likely to incur a branch misprediction penalty N1C Number of cycles to execute the if-else portion using conditional instructions assuming the if-condition to be true N2C Number of cycles to execute the if-else portion using conditional instructions assuming the if-condition to be false Once we have the above data, use conditional instructions when: P1- 100 – P1 P1 100 – P1 P2 N1 × -------
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.3.1.3. Optimizing Complex Expressions Conditional instructions should also be used to improve the code generated for complex expressions such as the C shortcut evaluation feature.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.3.2 Bit Field Manipulation The Intel® 80200 processor shift and logical operations provide a useful way of manipulating bit fields.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.3.3 Optimizing the Use of Immediate Values The Intel® 80200 processor MOV or MVN instruction should be used when loading an immediate (constant) value into a register. Please refer to the ARM Architecture Reference Manual for the set of immediate values that can be used in a MOV or MVN instruction.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.3.4 Optimizing Integer Multiply and Divide Multiplication by an integer constant should be optimized to make use of the shift operation whenever possible.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.3.5 Effective Use of Addressing Modes The Intel® 80200 processor provides a variety of addressing modes that make indexing an array of objects highly efficient. For a detailed description of these addressing modes please refer to the ARM Architecture Reference Manual.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4 Cache and Prefetch Optimizations This chapter considers how to use the various cache memories in all their modes and then examines when and how to use prefetch to improve execution efficiencies. B.4.1 Instruction Cache The Intel® 80200 processor has separate instruction and data caches.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.1.4. Locking Code into the Instruction Cache One very important instruction cache feature is the ability to lock code into the instruction cache. Once locked into the instruction cache, the code is always available for fast execution. Another reason for locking critical code into cache is that with the round robin replacement policy, eventually the code is evicted, even if it is a very frequently executed function.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.2 Data and Mini Cache The Intel® 80200 processor allows the user to define memory regions whose cache policies can be set by the user (see Section 6.2.3, “Cache Policies”). Supported policies and configurations are: • • • • • • • • Non Cacheable with no coalescing of memory writes. Non Cacheable with coalescing of memory writes. Mini-Data cache with write coalescing, read allocate, and write-back caching.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.2.3. Read Allocate and Read-write Allocate Memory Regions Most of the regular data and the stack for your application should be allocated to a read-write allocate region. It is expected that you write and read from them often. Data that is write only (or data that is written to and subsequently not used for a long time) should be placed in a read allocate region.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.2.5. Mini-data Cache The mini-data cache is best used for data structures, which have short temporal lives, and/or cover vast amounts of data space. Addressing these types of data spaces from the Data cache would corrupt much if not all of the Data cache by evicting valuable data. Eviction of valuable data reduces performance.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.2.6. Data Alignment Cache lines begin on 32-byte address boundaries. To maximize cache line use and minimize cache pollution, data structures should be aligned on 32 byte boundaries and sized to multiple cache line sizes. Aligning data structures on cache address boundaries simplifies later addition of prefetch instructions to optimize performance.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.2.7. Literal Pools The Intel® 80200 processor does not have a single instruction that can move all literals (a constant or address) to a register. One technique to load registers with literals in the Intel® 80200 processor is by loading the literal from a memory location that has been initialized with the constant or address. These blocks of constants are referred to as literal pools. See Section B.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.3 Cache Considerations B.4.3.1. Cache Conflicts, Pollution and Pressure Cache pollution occurs when unused data is loaded in the cache and cache pressure occurs when data that is not temporal to the current process is loaded into the cache. For an example, see Section B.4.4.2., “Prefetch Loop Scheduling” below. B.4.3.2. Memory Page Thrashing Memory page thrashing occurs because of the nature of SDRAM.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4 Prefetch Considerations The Intel® 80200 processor has a true prefetch load instruction (PLD). The purpose of this instruction is to preload data into the data and mini-data caches. Data prefetching allows hiding of memory transfer latency while the processor continues to execute instructions.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide The Intel® 80200 processor needs seven bus clocks to process a memory request to the SDRAM (Nprocessor). Typical SDRAM needs 2 to 3 bus clocks to select the memory locations provided that the current SDRAM memory page is selected (Nmemwait). If the current SDRAM memory page is not selected, then an additional 3 to 4 bus cycles are required to lookup the memory data locations (Nmempagewait).
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4.2. Prefetch Loop Scheduling When adding prefetch to a loop which operates on arrays, it may be advantages to prefetch ahead one, two, or more iterations. The data for future iterations is located in memory by a fixed offset from the data for the current iteration. This makes it easy to predict where to fetch the data.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4.6. Bandwidth Limitations Overuse of prefetches can usurp resources and degrade performance. This happens because once the bus traffic requests exceed the system resource capacity, the processor stalls.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4.7. Cache Memory Considerations Stride, the way data structures are walked through, can affect the temporal quality of the data and reduce or increase cache conflicts. The Intel® 80200 processor data cache and mini-data caches each have 32 sets of 32 bytes. This means that each cache line in a set is on a modular 1K-address boundary.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide on a 32-byte boundary, modifications to the Year2Date fields is likely to use two write buffers when the data is written out to memory.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4.8. Cache Blocking Cache blocking techniques, such as strip-mining, are used to improve temporal locality of the data. Given a large data set that can be reused across multiple passes of a loop, data blocking divides the data into smaller chunks which can be loaded into the cache during the first loop and then be available for processing on subsequence loops thus minimizing cache misses and reducing bus traffic.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4.10. Pointer Prefetch Not all looping constructs contain induction variables. However, prefetching techniques can still be applied. Consider the following linked list traversal example: while(p) { do_something(p->data); p = p->next; } The pointer variable p becomes a pseudo induction variable and the data pointed to by p->next can be prefetched to reduce data transfer latency for the next iteration of the loop.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4.11. Loop Interchange As mentioned earlier, the sequence in which data is accessed affects cache thrashing. Usually, it is best to access data in a contiguous spatially address range. However, arrays of data may have been laid out such that indexed elements are not physically next to each other. Consider the following C code which places array elements in row major order.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.4.4.13. Prefetch to Reduce Register Pressure Prefetch can be used to reduce register pressure. When data is needed for an operation, then the load is scheduled far enough in advance to hide the load latency. However, the load ties up the receiving register until the data can be used.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5 Instruction Scheduling This chapter discusses instruction scheduling optimizations. Instruction scheduling refers to the rearrangement of a sequence of instructions for the purpose of minimizing pipeline stalls. Reducing the number of pipeline stalls improves application performance.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide ; all other registers are in use sub r1, r6, r7 mul r3,r6, r2 mov r2, r2, LSL #2 orr r9, r9, #0xf add r0,r4, r5 ldr r6, [r0] add r8, r6, r8 add r8, r8, #4 orr r8,r8, #0xf ; The value in register r6 is not used after this In code sample above, ADD and LDR instruction can be moved before the MOV instruction. Note this would prevent pipeline stalls if the load hits the data cache.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.1.1. Scheduling Load and Store Double (LDRD/STRD) The Intel® 80200 processor introduces two new double word instructions: LDRD and STRD. LDRD loads 64-bits of data from an effective address into two consecutive registers, conversely, STRD stores 64-bits from two consecutive registers to an effective address.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.1.2. Scheduling Load and Store Multiple (LDM/STM) LDM and STM instructions have an issue latency of 2-20 cycles depending on the number of registers being loaded or stored. The issue latency is typically 2 cycles plus an additional cycle for each of the registers being loaded or stored assuming a data cache hit.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.2 Scheduling Data Processing Instructions Most Intel® 80200 processor data processing instructions have a result latency of 1 cycle. This means that the current instruction is able to use the result from the previous data processing instruction. However, the result latency is 2 cycles if the current instruction needs to use the result of the previous data processing instruction for a shift by immediate.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.3 Scheduling Multiply Instructions Multiply instructions can cause pipeline stalls due to either resource conflicts or result latencies. The following code segment would incur a stall of 0-3 cycles depending on the values in registers r1, r2, r4 and r5 due to resource conflicts.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.4 Scheduling SWP and SWPB Instructions The SWP and SWPB instructions have a 5 cycle issue latency. As a result of this latency, the instruction following the SWP/SWPB instruction would stall for 4 cycles. SWP and SWPB instructions should, therefore, be used only where absolutely needed.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.5 Scheduling the MRA and MAR Instructions (MRRC/MCRR) The MRA (MRRC) instruction has an issue latency of 1 cycle, a result latency of 2 or 3 cycles depending on the destination register value being accessed and a resource latency of 2 cycles.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.6 Scheduling the MIA and MIAPH Instructions The MIA instruction has an issue latency of 1 cycle. The result and resource latency can vary from 1 to 3 cycles depending on the values in the source register.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.5.7 Scheduling MRS and MSR Instructions The MRS instruction has an issue latency of 1 cycle and a result latency of 2 cycles. The MSR instruction has an issue latency of 2 cycles (6 if updating the mode bits) and a result latency of 1 cycle. Consider the code sample: mrs orr add r0, cpsr r0, r0, #1 r1, r2, r3 The ORR instruction above would incur a 1 cycle stall due to the 2-cycle result latency of the MRS instruction.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Optimization Guide B.6 Optimizing C Libraries Many of the standard C library routines can benefit greatly by being optimized for the Intel® 80200 processor architecture.
Test Features C The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) implements Design For Test (DFT) techniques to ensure quality and reliability. This appendix describes those techniques. C.1 Introduction Testing VLSI circuits is critical for achieving high outgoing quality levels.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.1 Boundary Scan Architecture Boundary scan test logic consists of a Boundary-Scan register and support logic. These are accessed through a Test Access Port (TAP). The TAP provides a simple serial interface that allows all processor signal pins to be driven and/or sampled, thereby providing the direct control and monitoring of processor pins at the system level.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.2 TAP Pins The Intel® 80200 processor TAP is composed of four input connections (TMS, TCK, TRST# and TDI) and one output connection (TDO). These pins are described in Table C-1. The TAP pins provide access to the instruction register and the test data registers. Table C-1.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.3 Instruction Register (IR) The instruction register holds instruction codes shifted through the Test Data Input (TDI) pin. The instruction codes are used to select the specific test operation to be performed and the test data register to be accessed. The instruction register is a parallel-loadable, master/slave-configured 5-bit wide, serial-shift register with latched outputs.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features Table C-3. IEEE Instructions Instruction / Requisite extest IEEE 1149.1 Required Opcode Description 000002 extest initiates testing of external circuitry, typically board-level interconnects and off chip circuitry. extest connects the Boundary-Scan register between TDI and TDO in the Shift_DR state only.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.4 TAP Test Data Registers The Intel® 80200 processor contains a device identification register and two test data registers (Bypass and RUNBIST). Each test data register selected by the TAP controller is connected serially between TDI and TDO. TDI is connected to the test data register’s most significant bit. TDO is connected to the least significant bit.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.5 TAP Controller The TAP controller is a 16-state synchronous finite state machine that controls the sequence of test logic operations. The TAP can be controlled via a bus master. The bus master can be either automatic test equipment or a component (i.e. PLD) that interfaces to the Test Access Port (TAP). The TAP controller changes state only in response to a rising edge of TCK or power-up.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.5.1. Test Logic Reset State In this state, test logic is disabled to allow normal operation of the Intel® 80200 processor. Test logic is disabled by loading the idcode register. No matter what the state of the controller, it enters Test-Logic-Reset state when the TMS input is held high (1) for at least five rising edges of TCK. The controller remains in this state while TMS is high.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.5.5. Shift-DR State In this controller state, the test data register, which is connected between TDI and TDO as a result of the current instruction, shifts data one bit position nearer to its serial output on each rising edge of TCK. Test data registers that the current instruction selects but does not place in the serial path, retain their previous value during this state.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.5.9. Update-DR State The Boundary-Scan register is provided with a latched parallel output. This output prevents changes at the parallel output while data is shifted in response to the extest, sample/preload instructions.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.5.13. Exit1-IR State This is a temporary state. If TMS is held high on the rising edge of TCK, the controller enters the Update-IR state, which terminates the scanning process. If TMS is held low on the rising edge of TCK, the controller enters the Pause-IR state. The test data register selected by the current instruction retains its previous value during this state.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features C.2.5.17. Boundary-Scan Example In the example that follows, two command actions are described. The example starts in the reset state, a new instruction is loaded and executed. See Figure C-3 for a JTAG example. The steps are: 1. Load the sample/preload instruction into the Instruction Register: a. Select the Instruction register scan. b.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features Figure C-3.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features Figure C-4.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Test Features Figure C-5.