ARM926EJ-S (r0p4/r0p5) Technical Reference Manual Copyright © 2001-2003 ARM Limited. All rights reserved.
ARM926EJ-S Technical Reference Manual Copyright © 2001-2003 ARM Limited. All rights reserved. Release Information Change history Date Issue Change 26 September 2001 A First release 29 January 2002 B Second release 5 December 2003 C Third release. Includes r0p5 changes. Defects corrected. 26 January 2004 D Fourth release. Includes r0p4. Technically identical to previous release.
Contents ARM926EJ-S Technical Reference Manual Preface About this manual ........................................................................................ xvi Feedback ..................................................................................................... xxi Chapter 1 Introduction 1.1 Chapter 2 Programmer’s Model 2.1 2.2 2.3 Chapter 3 About the programmer’s model ...................................................................
Contents Chapter 4 Caches and Write Buffer 4.1 4.2 4.3 4.4 4.5 Chapter 5 About the instruction memory barrier operation ......................................... 9-2 IMB operation ............................................................................................. 9-3 Example IMB sequences ............................................................................ 9-5 Embedded Trace Macrocell Support 10.1 iv About the ARM926EJ-S external coprocessor interface ............................
Contents Chapter 11 Debug Support 11.1 Chapter 12 Power Management 12.1 Appendix A About power management ........................................................................ 12-2 Signal Descriptions A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 Appendix B About debug support ................................................................................. 11-2 Signal properties and requirements ............................................................ A-2 AHB related signals ............................
Contents vi Copyright © 2001-2003 ARM Limited. All rights reserved.
List of Tables ARM926EJ-S Technical Reference Manual Table 2-1 Table 2-2 Table 2-3 Table 2-4 Table 2-5 Table 2-6 Table 2-7 Table 2-8 Table 2-9 Table 2-10 Table 2-11 Table 2-12 Table 2-13 Table 2-14 Table 2-15 Table 2-16 Table 2-17 Table 2-18 Table 2-19 Table 2-20 Table 2-21 Table 2-22 ARM DDI0198D Change history .............................................................................................................. ii CP15 register summary ...........................................................
List of Tables Table 2-23 Table 2-24 Table 2-25 Table 2-26 Table 2-27 Table 3-1 Table 3-2 Table 3-3 Table 3-4 Table 3-5 Table 3-6 Table 3-7 Table 3-8 Table 3-9 Table 3-10 Table 3-11 Table 3-12 Table 4-1 Table 4-2 Table 4-3 Table 4-4 Table 4-5 Table 4-6 Table 4-7 Table 5-1 Table 6-1 Table 6-2 Table 8-1 Table 8-2 Table 11-1 Table 11-2 Table A-1 Table A-2 Table A-3 Table A-4 Table A-5 Table A-6 Table A-7 Table B-1 Table B-2 Table B-3 Table B-4 Table B-5 Table B-6 Table B-7 Table B-8 Table B-9 viii TCM Regio
List of Tables Table B-10 Table B-11 Table B-12 Table B-13 ARM DDI0198D MMU Debug Control Register bit assignments ....................................................... B-14 Memory Region Remap Register instructions ......................................................... B-15 Encoding of the Memory Region Remap Register .................................................. B-16 Encoding of the remap fields ...................................................................................
List of Tables x Copyright © 2001-2003 ARM Limited. All rights reserved.
List of Figures ARM926EJ-S Technical Reference Manual Figure 1-1 Figure 1-2 Figure 1-3 Figure 2-1 Figure 2-2 Figure 2-3 Figure 2-4 Figure 2-5 Figure 2-6 Figure 2-7 Figure 2-8 Figure 2-9 Figure 2-10 Figure 2-11 Figure 2-12 Figure 2-13 Figure 2-14 Figure 2-15 Figure 2-16 Figure 3-1 Figure 3-2 Figure 3-3 ARM DDI0198D Key to timing diagram conventions ............................................................................ xix ARM926EJ-S block diagram .....................................................
List of Figures Figure 3-4 Figure 3-5 Figure 3-6 Figure 3-7 Figure 3-8 Figure 3-9 Figure 3-10 Figure 3-11 Figure 3-12 Figure 3-13 Figure 4-1 Figure 4-2 Figure 4-3 Figure 5-1 Figure 5-2 Figure 5-3 Figure 5-4 Figure 5-5 Figure 5-6 Figure 5-7 Figure 5-8 Figure 5-9 Figure 5-10 Figure 5-11 Figure 5-12 Figure 5-13 Figure 5-14 Figure 5-15 Figure 5-16 Figure 5-17 Figure 5-18 Figure 5-19 Figure 6-1 Figure 6-2 Figure 6-3 Figure 8-1 Figure 8-2 Figure 8-3 Figure 8-4 Figure 8-5 Figure 8-6 Figure 8-7 Figure 8-8 Figure 8
List of Figures Figure 12-2 Figure B-1 Figure B-2 Figure B-3 Figure B-4 Figure B-5 Figure B-6 Figure B-7 Figure B-8 Figure B-9 Figure B-10 ARM DDI0198D Logic for stopping ARM926EJ-S clock during wait for interrupt .............................. 12-3 CP15 MRC and MCR bit pattern ............................................................................... B-2 Rd format for selecting main TLB entry .....................................................................
List of Figures xiv Copyright © 2001-2003 ARM Limited. All rights reserved.
Preface This preface introduces the ARM926EJ-S Revision r0p4/r0p5 Technical Reference Manual (TRM). It contains the following sections: • About this manual on page xvi • Feedback on page xxi. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Preface About this manual This is the Technical Reference Manual for the ARM926EJ-S processor. Product revision status The rnpn identifier indicates the revision status of the product described in this manual, where: rn Identifies the major revision of the product. pn Identifies the minor revision or modification status of the product.
Preface Chapter 6 Bus Interface Unit Read this chapter for a description of the Bus Interface Unit (BIU) interface to AMBA. Chapter 7 Noncachable Instruction Fetches Read this chapter for a description of how speculative noncachable instruction fetches are used in the ARM926EJ-S processor to improve performance. Chapter 8 Coprocessor Interface Read this chapter for a description of the coprocessor interface. The chapter includes timing diagrams for coprocessor operations.
Preface Conventions This section describes the conventions that this manual uses: • Typographical • Timing diagrams • Signal naming on page xix • Numbering on page xx. Typographical This manual uses the following typographical conventions: italic Highlights important notes, introduces special terminology, denotes internal cross-references, and citations. bold Highlights interface elements, such as menu names. Denotes ARM processor signal names.
Preface Clock HIGH to LOW Transient HIGH/LOW to HIGH Bus stable Bus to high impedance Bus change High impedance to stable bus Key to timing diagram conventions Signal naming The level of an asserted signal depends on whether the signal is active-HIGH or active-LOW. Asserted means HIGH for active-HIGH signals and LOW for active-LOW signals: Prefix H Denotes Advanced High-performance Bus (AHB) signals.
Preface Numbering ’ This is a Verilog method of abbreviating constant numbers. For example: • ‘h7B4 is an unsized hexadecimal value. • ‘o7654 is an unsized octal value. • 8’d9 is an eight-bit wide decimal value of 9. • 8’h3F is an eight-bit wide hexadecimal value of 0x3F. This is equivalent to b00111111. • 8’b1111 is an eight-bit wide binary value of b00001111. Further reading This section lists publications by ARM Limited, and by third parties.
Preface Feedback ARM Limited welcomes feedback on the ARM926EJ-S processor and its documentation. Feedback on the product If you have any comments or suggestions about this product, contact your supplier giving: • the product name • a concise explanation of your comments. Feedback on this manual If you have any comments on this manual, send email to errata@arm.com giving: • the title • the number • the relevant page number(s) to which your comments apply • a concise explanation of your comments.
Preface xxii Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 1 Introduction This chapter introduces the ARM926EJ-S processor and its features. It contains the following section: • About the ARM926EJ-S processor on page 1-2. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Introduction 1.1 About the ARM926EJ-S processor The ARM926EJ-S processor is a member of the ARM9 family of general-purpose microprocessors. The ARM926EJ-S processor is targeted at multi-tasking applications where full memory management, high performance, low die size, and low power are all important. The ARM926EJ-S processor supports the 32-bit ARM and 16-bit Thumb instruction sets, enabling the user to trade off between high performance and high code density.
Introduction External coprocessor interface CPDOUT CPDIN CPINSTR DRDATA IRDATA DRWDATA Coprocessor interface ETM interface TCM interface ITCM DTCM DEXT Write buffer DROUTE DCACHE Cache PA TAGRAM Writeback write buffer Data AHB interface WDATA RDATA DA Bus interface unit MMU DMVA ARM9EJ-S INSTR FCSE IMVA AHB TLB Instruction AHB interface IA AHB ICACHE IROUTE IEXT Figure 1-1 ARM926EJ-S block diagram Figure 1-2 on page 1-4 and Figure 1-3 on page 1-5 show the ARM926EJ-S interfaces.
Introduction Clock CLK Interrupts nFIQ nIRQ Miscellaneous configuration JTAG debug Debug STANDBYWFI BIGENDINIT VINITHI CFGBIGEND TAPID[31:0] COMMRX COMMTX DBGACK DBGEN DBGRQI EDBGRQ DBGEXT[1:0] DBGINSTREXEC DBGRNG[1:0] DBGIEBRKPT DBGDEWPT DBGnTRST DBGTCKEN DBGTDI DBGTMS DBGTDO DBGIR[3:0] DBGSCREG[4:0] DBGTAPSM[3:0] DBGnTDOEN DBGSDIN DBGSDOUT ARM926EJ-S DRDMAEN DRDMAADDR[17:0] DRDMACS DRnRW DRADDR[17:0] DRWR[31:0] DRIDLE DRCS DRWBL[3:0] DRSEQ DRRD[31:0] DRWAIT DRSIZE[3:0] Data memory interface
Introduction ETM interface ETMEN FIFOFULL ETMBIGEND ETMHIVECS ETMIA[31:0] ETMInNREQ ETMISEQ ETMITBIT ETMIABORT ETMDA[31:0] ETMDMAS[1:0] ETMDMORE ETMDnMREQ ETMDnRW ETMDSEQ ETMRDATA[31:0] ETMDABORT ETMWDATA[31:0] ETMnWAIT ETMDBGACK ETMINSTREXEC ETMRNGOUT ETMID31TO25[6:0] ETMID15TO11[4:0] ETMCHSD[1:0] ETMCHSE[1:0] ETMPASS ETMLATECANCEL ETMPROCID[31:0] ETMPROCIDWR ETMINSTRVALID CPCLKEN CPINSTR[31:0] CPDOUT[31:0] CPDIN[31:0] CPPASS CPLATECANCEL CHSDE[1:0] CHSEX[1:0] nCPINSTRVALID nCPMREQ nCPTRANS CPBURST[3:0]
Introduction 1-6 Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 2 Programmer’s Model This chapter describes the ARM926EJ-S registers in CP15, the system control coprocessor, and provides information for programming the microprocessor. It contains the following sections: • About the programmer’s model on page 2-2 • Summary of ARM926EJ-S system control coprocessor (CP15) registers on page 2-3 • Register descriptions on page 2-7. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Programmer’s Model 2.1 About the programmer’s model The system control coprocessor (CP15) is used to configure and control the ARM926EJ-S processor. The caches, Tightly-Coupled Memories (TCMs), Memory Management Unit (MMU), and most other system options are controlled using CP15 registers. You can only access CP15 registers with MRC and MCR instructions in a privileged mode.
Programmer’s Model 2.2 Summary of ARM926EJ-S system control coprocessor (CP15) registers CP15 defines 16 registers. Table 2-1 shows the read and write functions of the registers.
Programmer’s Model All CP15 register bits that are defined and contain state are set to 0 by Reset except: 2.2.1 • The V bit is set to 0 at reset if the VINITHI signal is LOW, or 1 if the VINITHI signal is HIGH. • The B bit is set to 0 at reset if the BIGENDINIT signal is LOW, or 1 if the BIGENDINIT signal is HIGH. • The instruction TCM is enabled at reset if the INITRAM pin is HIGH. This enables booting from the instruction TCM and sets the ITCM bit in the ITCM region register to 1.
Programmer’s Model 31 28 27 26 25 24 23 Cond 1 1 1 0 21 20 19 Opcode L _1 16 15 12 11 10 9 8 7 CRn Rd 1 1 1 1 5 4 3 Opcode 1 _2 0 CRm Figure 2-1 CP15 MRC and MCR bit pattern The mnemonics for these instructions are: MCR{cond} p15,,,,, MRC{cond} p15,,,,, Attempting to read from a write-only register, or writing to a read-only register causes Unpredictable results.
Programmer’s Model Table 2-3 CP15 abbreviations (continued) Term Abbreviation Description Should Be One SBO When writing to this location, all bits in this field Should Be One. Should Be Zero or Preserved SBZP When writing to this location, all bits of this field Should Be Zero or preserved by writing the same value that has been previously read from the same field.
Programmer’s Model 2.
Programmer’s Model ID Code Register c0 This is a read-only register that returns the 32-bit device ID code. You can access the ID Code Register by reading CP15 register c0 with the Opcode_2 field set to any value other than 1 or 2. For example: MRC p15, 0, , c0, c0, {0, 3-7} ;returns ID The contents of the ID Code Register are shown in Table 2-5.
Programmer’s Model 31 30 29 28 0 0 0 25 24 23 Ctype 12 11 S 0 Dsize Isize Figure 2-2 Cache Type Register format Ctype The Ctype field determines the cache type. See Table 2-6. S bit Specifies if the cache is a unified cache (S=0), or separate ICache and DCache (S=1). If S=0, the Isize and Dsize fields both describe the unified cache and must be identical. In the ARM926EJ-S processor, this bit is set to a 1 to denote separate caches.
Programmer’s Model Assoc The Assoc field determines the cache associativity in conjunction with the M bit. M bit The multiplier bit determines the cache size and cache associativity values in conjunction with the Size and Assoc fields. If the cache is present, M must be set to 0. If the cache is absent, M must be set to 1. For the ARM926EJ-S processor, M is always set to 0. Len The Len field determines the line length of the cache. The size of the cache is determined by the Size field and the M bit.
Programmer’s Model The line length of the cache is determined by the Len field. The Len field is bits [13:12] for the DCache and bits [1:0] for the ICache. Table 2-9 shows the line length encoding.
Programmer’s Model Table 2-10 Example Cache Type Register format (continued) Function Isize Register bits Value Reserved [11:10] b00 Size [9:6] b0101 = 16KB Assoc [5:3] b010 = 4-way M [2] b0 Len [1:0] b10 = 8 words per line (32 bytes) TCM Status Register c0 This is a read-only register that enables operating systems to establish if TCM memories are present. See also TCM Region Register c9 on page 2-29.
Programmer’s Model MCR p15, 0, , c1, c0, 0 ; write control register All defined control bits are set to zero on reset except the V bit and the B bit. The V bit is set to zero at reset if the VINITHI signal is LOW, or one if the VINITHI signal is HIGH. The B bit is set to zero at reset if the BIGENDINIT signal is LOW, or one if the BIGENDINIT signal is HIGH. Figure 2-5 shows the format of the Control Register.
Programmer’s Model Table 2-11 Control bit functions register c1 (continued) Bit Name Function [13] V bit Location of exception vectors: 0 = Normal exception vectors selected, address range = 0x0000 0000 to 0x0000 001C 1 = High exception vectors selected, address range = 0xFFFF 0000 to 0xFFFF 001C. Set to the value of VINITHI on reset. [12] I bit ICache enable/disable: 0 = ICache disabled 1 = ICache enabled. [11:10] - SBZ. [9] R bit ROM protection. This bit modifies the ROM protection system.
Programmer’s Model • the RR bit. Assuming that TCM regions are disabled, the caches behave as shown in Table 2-12. Table 2-12 Effects of Control Register on caches Cache MMU Behavior ICache disabled Enabled or disabled All instruction fetches are from external memory (AHB). ICache enabled Disabled All instruction fetches are cachable, with no protection checks. All addresses are flat mapped. That is VA = MVA = PA.
Programmer’s Model Effects of the Control Register on TCM interface The M bit of the Control Register, when combined with the En bit in the respective TCM region register c9, directly affects the TCM interface behavior, as shown in Table 2-13. Table 2-13 Effects of Control Register on TCM interface TCM MMU Cache Behavior Instruction TCM disabled Disabled ICache disabled All instruction fetches are from the external memory (AHB).
Programmer’s Model Note Read accesses on the TCM interface are not prevented when an ARM9EJ-S core memory access is aborted. All reads on the TCM interface must be treated as speculative. ARM92EJ-S processor write accesses that are aborted do not take place on the TCM interface. 2.3.3 Translation Table Base Register c2 Register c2 is the Translation Table Base Register (TTBR), for the base address of the first-level translation table.
Programmer’s Model 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0 Figure 2-7 Register c3 format Each two-bit field defines the access permissions for one of the 16 domains (D15-D0) (see Table 2-14). Reading from c3 returns the value of the Domain Access Control Register. Writing to c3 writes the value of the Domain Access Control Register.
Programmer’s Model The FSR accessed is determined by the value of the Opcode_2 field: Opcode_2 = 0 Data Fault Status Register (DFSR). Opcode_2 = 1 Instruction Fault Status Register (IFSR). The fault type encoding is listed in Table 3-9 on page 3-22.
Programmer’s Model Table 2-16 shows the encodings used for the status field in the FSR, and if the Domain field contains valid information. See Fault address and fault status registers on page 3-21 for details of MMU aborts.
Programmer’s Model Reading from CP15 c7 is Unpredictable, with the exception of the two test and clean operations (see Table 2-18 on page 2-22 and Test and clean operations on page 2-24). You can use the following instruction to write to c7: MCR p15, , , , , The cache functions, and a description of each function, provided by this register are listed in Table 2-17.
Programmer’s Model Table 2-17 Function descriptions register c7 (continued) Function Description Prefetch ICache line Performs an ICache lookup of the specified modified virtual address. If the cache misses, and the region is cachable, a linefill is performed. Drain write buffer This instruction acts as an explicit memory barrier. It drains the contents of the write buffers of all memory stores occurring in program order before this instruction is completed.
Programmer’s Model Table 2-18 Cache operations c7 (continued) Function/operation Data format Instruction Invalidate DCache single entry (Set/Way) Set/Way MCR p15, 0, , c7, c6, 2 Clean DCache single entry (MVA) MVA MCR p15, 0, , c7, c10, 1 Clean DCache single entry (Set/Way) Set/Way MCR p15, 0, , c7, c10, 2 Test and clean DCache - MRC p15, 0, , c7, c10, 3 Clean and invalidate DCache entry (MVA) MVA MCR p15, 0, , c7, c14, 1 Clean and invalidate DCache entry (Set/Way) Se
Programmer’s Model 31 32-A 31-A S+5 S+4 Way SBZ 5 4 Set (= index) 2 1 0 Word SBZ Figure 2-10 Register c7 Set/Way format Test and clean operations The test and clean DCache instruction provides an efficient way to clean the entire DCache using a simple loop. The test and clean DCache instruction tests a number of lines in the DCache to determine if any of them are dirty. If any dirty lines are found, then one of those lines is cleaned.
Programmer’s Model The fully-associative part (also referred to as the lockdown part of the TLB) is used to store entries to be locked down. Entries held in the lockdown part of the TLB are preserved during an invalidate TLB operation. Entries can be removed from the lockdown TLB using an invalidate TLB single entry operation. Six TLB operations are defined, and the function to be performed is selected by the Opcode_2 and CRm fields in the MCR instruction used to write CP15 c8.
Programmer’s Model 31 10 9 Modified virtual address 0 SBZ Figure 2-11 Register c8 MVA format Note If either small or large pages are used, and these pages contain subpage access permissions that are different, then you must use four invalidate TLB single entry operations, with the MVA set to each subpage, to invalidate all information related to that page held in a TLB. 2.3.10 Cache Lockdown and TCM Region Registers c9 Register c9 accesses the Cache Lockdown and TCM Region Registers.
Programmer’s Model The first four bits of this register determine the L bit for the associated cache way. The Opcode_2 field of the MRC or MCR instruction determines whether the instruction or data lockdown register is accessed: Opcode_2 = 0 Selects the DCache lockdown register. Opcode_2 = 1 Selects the ICache lockdown register. You can use the instructions shown in Table 2-20 to access the Cache Lockdown Register.
Programmer’s Model The format of the Cache Lockdown Register L bits is shown in Table 2-21. All cache ways are available for allocation from reset.
Programmer’s Model 7. 8. For each of the cache lines to be locked down in cache way i: • If a DCache is being locked down, use an LDR instruction to load a word from the memory cache line to ensure that the memory cache line is loaded into the cache. • If an ICache is being locked down, use the register c7 MCR prefetch ICache line (CRm == c13, Opcode2 == 1) to fetch the memory cache line into the cache.
Programmer’s Model The TCM Region Register format is shown in Figure 2-13. 31 12 11 Base address (physical address) 6 5 SBZ/UNP 2 1 0 Size 0 Enable Figure 2-13 TCM Region Register c9 format Table 2-23 shows the bit assignments for the TCM Region Register. Table 2-23 TCM Region Register c9 Bits Function [31:12] Base address (physical address). [11:6] SBZ/UNP. [5:2] Size. The Size field reflects the value of the IRSIZE/DRSIZE macrocell inputs. The Size field encoding is shown in Table 2-24.
Programmer’s Model Table 2-24 TCM Size field encoding (continued) Memory size Value 64KB b0111 128KB b1000 256KB b1001 512KB b1010 1MB b1011 Reserved b1100, b1101, b1110, b1111 If either the data or instruction TCM is disabled, then the contents of the respective TCM are not accessed. If the TCM is subsequently re-enabled, the contents will not have been changed by the ARM926EJ-S processor.
Programmer’s Model 2.3.11 TLB Lockdown Register c10 The TLB Lockdown Register controls where hardware page table walks place the TLB entry, in the set associative region or the lockdown region of the TLB, and if in the lockdown region, which entry is written. The lockdown region of the TLB contains eight entries. See TLB structure on page 3-31 for a description of the structure of the TLB.
Programmer’s Model Note It is not possible for a lockdown entry to entirely map either small or large pages, unless all the subpage access permissions are identical. Entries can still be written into the lockdown region, but the address range that is mapped only covers the subpage corresponding to the address that was used to perform the page table walk. Example 2-1 is a code sequence that locks down an entry to the current victim.
Programmer’s Model FCSE PID Register Addresses issued by the ARM9EJ-S core in the range 0 to 32MB are translated in accordance with the value contained in this register. Address A becomes A + (FCSE PID x 32MB). It is this modified address that is seen by the caches, MMU, and TCM interface. Addresses above 32MB are not modified. The FCSE PID is a seven-bit field, enabling 128 x 32MB processes to be mapped.
Programmer’s Model {FCSE PID = 0} MOV r0, #1:SHL:25 MCR p15,0,r0,c13,c0,0 A1 A2 A3 ;Fetched ;Fetched ;Fetched ;Fetched ;Fetched with with with with with FCSE FCSE FCSE FCSE FCSE PID PID PID PID PID = = = = = 0 0 0 0 1 Where A1, A2, and A3 are the three instructions following the fast context switch. Context ID Register The Context ID Register provides a mechanism to allow real-time trace tools to identify the currently executing process in multi-tasking environments.
Programmer’s Model 2.3.15 Test and Debug Register c15 You can use register c15 to provide device-specific test and debug operations in ARM926EJ-S processors. Appendix B CP15 Test and Debug Registers describes the registers and functions available using CP15 c15.This register is defined to be reserved for implementation-defined purposes in the ARM Architecture Reference Manual.
Chapter 3 Memory Management Unit This chapter describes the Memory Management Unit (MMU). It contains the following sections: • About the MMU on page 3-2 • Address translation on page 3-5 • MMU faults and CPU aborts on page 3-21 • Domain access control on page 3-24 • Fault checking sequence on page 3-26 • External aborts on page 3-29 • TLB structure on page 3-31. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Memory Management Unit 3.1 About the MMU The ARM926EJ-S MMU is an ARM architecture v5 MMU. It provides virtual memory features required by systems operating on platforms such as Symbian OS, WindowsCE, and Linux. A single set of two-level page tables stored in main memory is used to control the address translation, permission checks, and memory region attributes for both data and instruction accesses.
Memory Management Unit 3.1.1 Access permissions and domains For large and small pages, access permissions are defined for each subpage (1KB for small pages, 16KB for large pages). Sections and tiny pages have a single set of access permissions. All regions of memory have an associated domain. A domain is the primary access control mechanism for a region of memory. It defines the conditions necessary for an access to proceed.
Memory Management Unit 3.1.3 MMU program accessible registers Table 3-1 shows the CP15 registers that are used in conjunction with page table descriptors stored in memory to determine the operation of the MMU. Table 3-1 MMU program-accessible CP15 registers Register Bits Register description Control register c1 M, A, S, R Contains bits to enable the MMU (M bit), enable data address alignment checks (A bit), and to control the access protection scheme (S bit and R bit).
Memory Management Unit 3.2 Address translation The VA generated by the CPU core is converted to a Modified Virtual Address (MVA) by the FCSE using the value held in CP15 c13. The MMU translates MVAs into physical addresses to access external memory, and also performs access permission checking. The MMU table-walking hardware is used to add entries to the TLB.
Memory Management Unit 3.2.1 Translation table base The hardware translation process is initiated when the TLB does not contain a translation for the requested MVA. The Translation Table Base Register (TTBR), CP15 register c2, points to the base address of a table in physical memory that contains section or page descriptors, or both. The 14 low-order bits [13:0] of the TTBR are Unpredictable on a read, and the table must reside on a 16KB boundary. Figure 3-1 shows the format of the TTBR.
Memory Management Unit TTB base Translation table Section base Section Large page base Indexed by modified virtual address bits [19:0] Indexed by modified virtual address bits [31:20] Indexed by modified virtual address bits [15:0] 1MB 4096 entries Coarse page table base Coarse page table 64KB Small page Indexed by modified virtual address bits [19:12] Indexed by modified virtual address bits [11:0] 256 entries Fine page table base Large page Fine page table 4KB Tiny page Indexed by modifi
Memory Management Unit 3.2.2 First-level fetch Bits [31:14] of the TTBR are concatenated with bits [31:20] of the MVA to produce a 30-bit address as shown in Figure 3-3. Modified virtual address 31 20 19 0 Table index Translation table base 31 14 13 0 Translation base 31 14 13 2 1 0 Table index Translation base 31 0 0 0 First-level descriptor Figure 3-3 Accessing translation table first-level descriptors This address selects a 4-byte translation table entry.
Memory Management Unit 31 20 19 12 11 10 9 8 Coarse page table base address Section base address AP Fine page table base address 5 4 3 2 1 0 0 0 Fault 0 1 Coarse page table Domain 1 Domain 1 C B 1 0 Section Domain 1 Fine page table 1 1 Figure 3-4 First-level descriptor A section descriptor provides the base address of a 1MB block of memory. The page table descriptors provide the base address of a page table that contains second-level descriptors.
Memory Management Unit Table 3-2 First-level descriptor bits (continued) Bits Description Section Coarse Fine [3:2] - - Bits C and B indicate whether the area of memory mapped by this page is treated as write-back cachable, write-through cachable, noncached buffered, or noncached nonbuffered. - [3:2] [3:2] Should Be Zero. [1:0] [1:0] [1:0] These bits indicate the page size and validity and are interpreted as shown in Table 3-3.
Memory Management Unit Section descriptor bit assignments are described in Table 3-4. Table 3-4 Section descriptor bits 3.2.
Memory Management Unit Coarse page table descriptor bit assignments are described in Table 3-5. Table 3-5 Coarse page table descriptor bits 3.2.
Memory Management Unit Table 3-6 shows the fine page table descriptor bit assignments. Table 3-6 Fine page table descriptor bits 3.2.
Memory Management Unit Modified virtual address 31 20 19 0 Table index Section index Translation table base 31 14 13 0 Translation base 31 14 13 2 1 0 Translation base Table index 0 0 Section first-level descriptor 31 20 19 Section base address 12 11 10 9 8 SBZ AP 0 5 4 3 2 1 0 Domain 1 C B 1 0 Physical address 31 20 19 0 Section index Section base address Figure 3-8 Section translation 3.2.
Memory Management Unit 31 16 15 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 Fault Large page base address AP3 AP2 AP1 AP0 C B 0 1 Large page Small page base address AP3 AP2 AP1 AP0 C B 1 0 Small page Tiny page base address AP C B 1 1 Tiny page Figure 3-9 Second-level descriptor A second-level descriptor defines a tiny, a small, or a large page descriptor, or is invalid: • a large page descriptor provides the base address of a 64KB block of memory • a small page descriptor provides the base address of a
Memory Management Unit Table 3-7 Second-level descriptor bits (continued) Bits Description Large Small Tiny [11:4] [11:4] [5:4] Access permission bits. Domain access control on page 3-24 and Fault checking sequence on page 3-26 show how to interpret the access permission bits. [3:2] [3:2] [3:2] These bits, C and B, indicate whether the area of memory mapped by this page is treated as write-back cachable, write-through cachable, noncached buffered, or noncached nonbuffered.
Memory Management Unit Modified virtual address 31 20 19 16 15 L2 table index Table index 12 11 0 Page index Translation table base 31 14 13 0 Translation base 31 14 13 2 1 0 Translation base Table index 0 0 First-level descriptor 31 10 9 8 Coarse page table base address 5 4 3 2 1 0 Domain 31 1 10 9 Coarse page table base address 0 1 2 1 0 L2 table index 0 0 Second-level descriptor 31 16 15 12 11 10 9 8 7 6 5 4 3 2 1 0 Page base address AP3 AP2 AP1 AP0 C B 0 1 Physical addres
Memory Management Unit 3.2.10 Translating small page references Figure 3-11 shows the complete translation sequence for a 4KB small page.
Memory Management Unit 3.2.11 Translating tiny page references Figure 3-12 shows the complete translation sequence for a 1KB tiny page.
Memory Management Unit Note The domain specified in the first-level description and access permissions specified in the first-level description together determine whether the access has permissions to proceed. See section Domain access control on page 3-24 for details. Subpages You can define access permissions for subpages of small and large pages. If, during a page table walk, a small or large page has a different subpage permission, only the subpage being accessed is written into the TLB.
Memory Management Unit 3.3 MMU faults and CPU aborts The MMU generates an abort on the following types of faults: • alignment faults (data accesses only) • translation faults • domain faults • permission faults. In addition, an external abort can be raised by the external system. This can happen only for access types that have the core synchronized to the external system: • page walks • noncached reads • nonbuffered writes • noncached read-lock-write sequence (SWP).
Memory Management Unit Fault status register (FSR) Table 3-9 shows the various access permissions and controls supported by the data MMU, and how these are interpreted to generate faults.
Memory Management Unit Fault address register (FAR) For load and store instructions that can involve the transfer of more than one word (LDM/STM, LDRD, STRD, and STC/LDC), the value written into the FAR register depends on the type of access, and for external aborts, on whether or not the access crosses a 1KB boundary. Table 3-10 shows the FAR values for multi-word transfers. Table 3-10 FAR values for multi-word transfers Source FAR Alignment MVA of first aborted address in transfer.
Memory Management Unit 3.4 Domain access control MMU accesses are primarily controlled through the use of domains. There are 16 domains and each has a two-bit field to define access to it. Two types of user are supported: • clients • managers. The domains are defined in the domain access control register, CP15 c3. Figure 2-7 on page 2-18 shows how the 32 bits of the register are allocated to define the 16 two-bit domains.
Memory Management Unit Table 3-12 Interpreting access permission (AP) bits (continued) ARM DDI0198D AP S R Privileged permissions User permissions 01 x x Read/write No access 10 x x Read/write Read-only 11 x x Read/write Read/write Copyright © 2001-2003 ARM Limited. All rights reserved.
Memory Management Unit 3.5 Fault checking sequence The sequence the MMU uses to check for access faults is different for sections and pages. The sequence for both types of access is shown in Figure 3-13.
Memory Management Unit • • • 3.5.1 Translation faults Domain faults Permission faults on page 3-28. Alignment faults If alignment fault checking is enabled (the A bit in CP15 c1 is set), the MMU generates an alignment fault on any data word access if the address is not word-aligned, or on any halfword access if the address is not halfword-aligned, irrespective of whether the MMU is enabled or not. An alignment fault is not generated on any instruction fetch or any byte access.
Memory Management Unit 3.5.4 Permission faults If the two-bit domain field returns 01 (client), then access permissions are checked as follows: Section If the level one descriptor defines a section-mapped access, the AP bits of the descriptor define whether or not the access is allowed, according to Table 3-12 on page 3-24. Their interpretation is dependent on the setting of the S and R bits (CP15 c1 bits 8 and 9). If the access is not allowed, a section permission fault is generated.
Memory Management Unit 3.6 External aborts In addition to the MMU generated aborts, external aborts can be generated for certain types of access that involve transfers over the AHB bus. These can be used to flag errors on external memory accesses. However, not all accesses can be aborted in this way. The following accesses can be externally aborted: • page walks • noncached reads • nonbuffered writes • noncached read-lock-write (SWP) sequence.
Memory Management Unit Note Because the same register, CP15 c1, controls the enabling of the ICache, DCache, and the MMU, all three can be enabled using a single MCR instruction. 3.6.2 Disabling the MMU To disable the MMU, clear bit 0 in CP15 c1. Note If the MMU is enabled, then disabled, and subsequently re-enabled, the contents of the TLB are preserved. If these are now invalid, then the TLB must be invalidated before re-enabling the MMU. See TLB Operations Register c8 on page 2-24.
Memory Management Unit 3.7 TLB structure The MMU contains a single unified TLB used for both data accesses and instruction fetches. The TLB is divided into two parts: • an eight-entry fully-associative part used exclusively for holding locked down TLB entries • a set-associative part for all other entries, 2 way x 32 entry.
Memory Management Unit 3-32 Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 4 Caches and Write Buffer This chapter describes the Instruction Cache (ICache), the Data Cache (DCache), and the write buffer. It contains the following sections: • About the caches and write buffer on page 4-2 • Write buffer on page 4-4 • Enabling the caches on page 4-5 • TCM and cache access priorities on page 4-8 • Cache MVA and Set/Way formats on page 4-9. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Caches and Write Buffer 4.1 About the caches and write buffer The ARM926EJ-S processor includes: • an Instruction Cache (ICache) • a Data Cache (DCache) • a write buffer. The size of the caches can be from 4KB to 128KB, in power of two increments. The caches have the following features: • The caches are virtual index, virtual tag, addressed using the Modified Virtual Address (MVA). This enables the avoidance of cache cleaning and/or invalidating on context switch.
Caches and Write Buffer The latter allows DCache coherency to be efficiently maintained when small code changes occur, for example for self-modifying code and changes to exception vectors. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Caches and Write Buffer 4.2 Write buffer The write buffer is used for all writes to a noncachable, bufferable region, write-through region, and write misses to a write-back region. A separate buffer is incorporated in the DCache for holding write-back data for cache line evictions or cleaning of dirty cache lines. The main write buffer has a 16-word data buffer and a four-address buffer. The DCache write-back buffer has eight data word entries and a single address entry.
Caches and Write Buffer 4.3 Enabling the caches On reset, the ICache and DCache entries are all invalidated and the caches are disabled. The caches are not accessed for reads or writes. The caches are enabled using the I, C, and M bits from CP15 c1, and can be enabled independently of one another. Table 4-1 gives the I and M bit settings for the ICache, and the associated behavior. The priority of the TCM and cache behavior is described in TCM and cache access priorities on page 4-8.
Caches and Write Buffer Table 4-3 gives the CP15 c1 C and M bit settings for DCache, and the associated behavior. Table 4-3 CP15 c1 C and M bit settings for the DCache CP15 c1 C bit CP15 c1 M bit ARM926EJ-S behavior 0 0 DCache disabled. All data accesses are to the external memory. 1 0 DCache enabled, MMU disabled. The C bit is overriden by the M bit setting, which means that the DCache is effectively disabled. All data accesses are noncachable, nonbufferable, with no protection checks.
Caches and Write Buffer Table 4-4 Page table C and B bit settings for the DCache (continued) ARM DDI0198D Page table C bit Page table B bit Description ARM926EJ-S behavior 1 1 Write-back DCache enabled: Read hit Read from DCache Read miss Linefill Write hit Write to the DCache only Write miss Buffered store to external memory. Copyright © 2001-2003 ARM Limited. All rights reserved.
Caches and Write Buffer 4.4 TCM and cache access priorities The priorities that apply to the ARM926EJ-S processor for instruction accesses are shown in Table 4-5. The ARM926EJ-S processor gives highest priority to an address that is in the instruction TCM region.
Caches and Write Buffer 4.5 Cache MVA and Set/Way formats This section shows how the MVA and Set/Way formats of ARM926EJ-S caches map to a generic virtually indexed, virtually addressed cache. Figure 4-1 shows a generic, virtually indexed, virtually addressed cache.
Caches and Write Buffer 31 S+5 S+4 Tag 5 4 Index 0 1 2 3 4 5 2 1 0 Word Byte TAG 6 7 n 2 3 0 1 Figure 4-2 ARM926EJ-S cache associativity Table 4-7 shows values of S and NSETS for an ARM926EJ-S cache. Table 4-7 Values of S and NSETS ARM926EJ-S cache size S NSETS 4KB 5 32 8KB 6 64 16KB 7 128 32KB 8 256 64KB 9 512 128KB 10 1024 Figure 4-2 shows the ARM926EJ-S cache associativity.
Caches and Write Buffer • • • the ARM926EJ-S caches are four-way Associative the range of tags addressed by the Index define a Way the number of tags in a Way is the number of Sets, NSETS. The Set/Way/Word format for ARM926EJ-S caches is shown in Figure 4-3. 32-A 31 31-A S+5 S+4 Way 5 4 Set select (= Index) SBZ 2 1 0 Word SBZ Figure 4-3 ARM926EJ-S cache Set/Way/Word format In Figure 4-3: A = log2 Associativity. For example, for a four-way cache A = 2. S = log2 NSETS.
Caches and Write Buffer 4-12 Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 5 Tightly-Coupled Memory Interface This chapter describes the ARM926EJ-S Tightly-Coupled Memory (TCM) interface.
Tightly-Coupled Memory Interface 5.1 About the tightly-coupled memory interface The ARM926EJ-S processor enables low latency access to external memories using the Tightly Coupled Memory (TCM) interface. The term tightly coupled memory refers to the relationship between the ARM9EJ-S CPU core, and the operation of the memories, where there is a strong correlation between the instruction and data access activity of the ARM9EJ-S and the accesses made to external memory.
Tightly-Coupled Memory Interface memory. The TCM interface contains a two entry write buffer, which avoids the need for stall cycles because of the mismatch between the ARM9EJ-S native memory interface, and the requirements for standard SRAM. TCM accesses can be extended by using the IRWAIT/DRWAIT inputs to generate wait states. However, the timing of these and other interface signals is such that the types of memory sub-systems that can be implemented are limited.
Tightly-Coupled Memory Interface 5.2 TCM interface signals The TCM interface is designed to be compatible the timings of standard ASIC SRAM components, allowing connection to single cycle SRAM with minimal interfacing logic required. For standard SRAM the chip-select, address, and write data/control signals are setup in one cycle, and the read or write operation takes place in the next cycle. 5.2.1 Data interface signals The signals in the DTCM interface can be grouped by function into four categories.
Tightly-Coupled Memory Interface DRWAIT DRWAIT is used to extend a TCM transfer by inserting wait states. The timing of the DRWAIT signal is a cycle ahead of the cycle in which the data transfer takes place, which means that if an access is to be waited, DRWAIT must be asserted in the same cycle as DRCS and deasserted one cycle before the data transfer takes place. DRIDLE The DRIDLE signal provides an early indication that no TCM access will take place in the current cycle.
Tightly-Coupled Memory Interface DRWD[31:0] DRWD is the write data written into the TCM. It is valid in the same cycle as DRCS and held stable until the penultimate cycle of the access. DMA signals The DMA interface allows the values of DRADDR and DRCS to be generated from a source external to the ARM926EJ-S processor. DRDMAEN DRDMAEN is the DMA enable signal.
Tightly-Coupled Memory Interface 5.2.2 Instruction TCM signals The instruction side TCM signals are almost identical to the DTCM signals. All the signals on the DTCM have an equivalent on the instruction side. • Control signals — IRCS — IRWAIT — IRIDLE • Address and attribute signals — IRSEQ — IRADDR[17:0] — IRWBL[3:0] — IRnRW • Data signals — IRRD[31:0] — IRWD[31:0] • DMA signals — IRDMAEN — IRDMACS — IRDMAADDR[17:0]. 5.2.
Tightly-Coupled Memory Interface 5.3 TCM interface bus cycle types and timing The TCM bus interface is pipelined to enable back-to-back accesses to TCM memory with zero wait states. For each TCM access there is one request cycle and one or more data cycles. Figure 5-1 shows a multi-cycle data side TCM access.
Tightly-Coupled Memory Interface 5.3.1 Zero wait state timing For zero wait state accesses the timing of the TCM interface corresponds to the timing of a standard SRAM component, with minimal interfacing logic required. Figure 5-2 shows examples of zero wait state accesses on the ITCM interface corresponding to instruction fetches. All accesses are reads.
Tightly-Coupled Memory Interface T1 T2 T3 T4 T5 T6 T7 CLK DRCS DRSEQ DnRW DRADDR A DRRD C C+1 D(A) DRWD DRWBL B D(C) D(C+1) D(D) D(B) 0000 1111 D 0000 0001 Figure 5-3 Data side zero wait state accesses In cycle T1, a nonsequential read request is made to address A. In cycle T2, a nonsequential word write request is made to address B and data is returned for the access to A. In cycle T3, no request is made. In cycle T4, a nonsequential read request is made to address C.
Tightly-Coupled Memory Interface DRDMAADDR Early address Late address 1 0 1 0 DRADDR DRDMAEN DRDMACS Early CS Late CS DRCS Figure 5-4 Relationship between DRDMAEN, DRDMACS, DRDMAADDR, DRADDR and DRCS Internal to the ARM926EJ-S processor there are multiple sources for both the address and chip-select outputs. The address and chip-select outputs of the TCM interface are timing critical, however not all of the internal sources are timing critical.
Tightly-Coupled Memory Interface T1 T2 T3 T4 T5 T6 CLK DRDMAEN DRCS DRDMACS DRADDR DRDMAADDR A B B+1 B+2 A C C B+2 B+3 B+3 DRSEQ DRIDLE Figure 5-5 DMA access interaction with normal DTCM accesses In cycle T1, the ARM926EJ-S internal TCM controller is idle and DRIDLE is asserted. DRDMAEN is asserted, and consequently the value of DRDMAADDR is propagated onto DRADDR, and DRCS is asserted (DRDMACS = 1). DRSEQ is forced LOW.
Tightly-Coupled Memory Interface 5.3.3 Multi-cycle access timing If non zero wait state memory is used for TCM, then the DRWAIT/IRWAIT signals are used to wait the ARM926EJ-S. The wait information for a data cycle is pipelined so that the value of DRWAIT/IRWAIT pertains to the following data cycle, which corresponds to the request cycle for the first data cycle. If there is no active TCM access then the value on DRWAIT/IRWAIT is ignored. This allows the wait signals to be generated speculatively.
Tightly-Coupled Memory Interface IRCS = 0 WAIT IRCS = 1 COMPLETE Figure 5-7 State machine for generating a single wait state In the WAIT state IRWAIT is asserted. In the COMPLETE state IRWAIT is deasserted. Certain types of memories can have different access penalties depending on whether an access is sequential or nonsequential. The IRSEQ/DRSEQ signals indicate if an access is sequential in the request cycle for an access, and are held HIGH during waited cycles.
Tightly-Coupled Memory Interface T1 T2 T3 T4 T5 T6 T7 CLK IRCS IRSEQ IRWAIT IRADDR IRRD A A+1 I(A) B I(A+1) I(B) Figure 5-9 Cycle timing of loopback circuit In cycle T1, a nonsequential request is made to address A and IRWAIT is asserted. In cycle T2, IRSEQ is asserted because of the wait-state. IRWAIT is deasserted. IRCS is unknown. In cycle T3, the access to A completes and a sequential request is made to A+1. IRSEQ is HIGH and IRWAIT is LOW In cycle T4, the access to A+1 completes.
Tightly-Coupled Memory Interface FORCE_NSEQ DMAWAIT DRWAIT SEQ DRSEQ CS DRCS DRADDR[17:0] DRWBL[3:0] DRnRW DRWD[31:0] DMA (A, WE, nRW) REQCLK TCM A, WE, nRW WD RD DMA WD DRRD[31:0] Figure 5-10 DMA with single wait state for nonsequential accesses The logic used to generate DRWAIT uses both the loopback scheme using DRSEQ for inserting a wait state for a nonsequential request, and an additional signal DMAWAIT, for stalling during DMA accesses.
Tightly-Coupled Memory Interface T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 CLK DRCS DRSEQ DRADDR A A+1 A+2 A+1 A+2 D DRWAIT DMAWAIT FORCE_NSEQ REQCLK CS A B A C D SEQ RD DRRD D(B) D(A) D(C) D(A+1) D(A+2) D(D) D(A+1) D(A+2) D(D) D(A) Figure 5-11 Cycle timing of circuit with DMA and single wait state for nonsequential accesses In cycle T1, the ARM926EJ-S initiates a sequential request to address A and the DMA gains ownership of the TCM. DRWAIT is asserted because of DMAWAIT.
Tightly-Coupled Memory Interface In cycle T5, the access to A completes. A sequential request is made to A+1. There is no DMA activity. In cycle T6, the access to A+1 completes. A sequential request is made to A+2. There is no DMA activity In cycle T7, the access to A+2 completes. No request is made and DRCS is deasserted. A DMA access to address C starts and DRWAIT is asserted using DMAWAIT. In cycle T8, DRWAIT remains HIGH because of DMA access. No request is made, and DRCS remains LOW.
Tightly-Coupled Memory Interface 5.4 TCM programmer’s model After reset, the behavior of the TCMs is controlled by the state of the TCM Region Register, CP15 c9. 5.4.1 Enabling the ITCM The ITCM can automatically be enabled at reset using the INITRAM pin. If INITRAM is held HIGH during system reset, and the VINITHI pin is deasserted, the ITCM is enabled with the ITCM region base set to 0x0. This allows boot code to be run from the ITCM. Boot code must be pre-loaded into the TCM for this to be useful.
Tightly-Coupled Memory Interface 5.5 TCM interface examples This section contains the following examples: • Zero-wait-state RAM example • Producing byte writable memory using word writable RAM • Multiple banks of RAM example on page 5-21. Note Most of the examples in this section are for the DTCM interface. These are also applicable to the ITCM interface. The additional logic required for implementing the examples in this section is the responsibility of the implementer. 5.5.
Tightly-Coupled Memory Interface The rules for connecting four RAM blocks are: • Each byte-wide RAM has the same address and chip-select control as the word-wide RAM. • The following connections must be made: — DRWBL[0], DRWD[7:0], and DRRD[7:0], connect to RAM byte 0 — DRWBL[1], DRWD[15:8], and DRRD[15:8], connect to RAM byte 1 — DRWBL[2], DRWD[23:16], and DRRD[23:16], connect to RAM byte 2 — DRWBL[3], DRWD[31:24], and DRRD[31:24], connect to RAM byte 3.
Tightly-Coupled Memory Interface • If a fast design is more important than minimizing power consumption, you must follow the example in Optimizing for speed on page 5-23. The rules for producing memory out of smaller RAM blocks are: • There must be an even number of RAM blocks b (b = 2, 4, 8, for example) • Each RAM block must be the same size. • If the address width of the required memory size is n bits, the address port of the smaller RAM blocks is m = n-(logb/log2) bits wide.
Tightly-Coupled Memory Interface ARM926EJ-S DRWD[31:0] DRADDR[17:0] DRWBL[3:0] DRSIZE[3:0] DRADDR[13:0] b1000 DRIDLE DRWAIT DIN[31:0] BW[3:0] A[13:0] DIN[31:0] BW[3:0] A[13:0] RAM 64KB RAM 64KB CLK DRSEQ CLK Bank 1 DRADDR[14] DRnRW DRADDR[13:0] Bank 0 WE WE CS DOUT[31:0] CS DOUT[31:0] DRCS DRRD[31:0] CLK Figure 5-14 Optimizing for power Optimizing for speed Figure 5-15 on page 5-24 shows how to produce a large memory from two smaller RAM blocks if you are optimizing for speed.
Tightly-Coupled Memory Interface ARM926EJ-S DRWD[31:0] DRADDR[17:0] DRWBL[3:0] DRADDR[14] DRnRW DRADDR[14] DIN[31:0] BW[3:0] A[13:0] DIN[31:0] BW[3:0] A[13:0] RAM 64KB DRSIZE[3:0] DRSEQ DRWD[31:0] DRADDR[13:0] DRWBL[3:0] b1000 RAM 64KB WE CLK WE CLK Bank 1 DRWAIT Bank 0 CS DOUT[31:0] CS DOUT[31:0] DRCS DRRD[31:0] CLK Figure 5-15 Optimizing for speed 5.5.
Tightly-Coupled Memory Interface ARM926EJ-S IRWAIT IRSEQ IRCS CS IRADDR[17:0] 1 A 0 EN ROM +1 IRRD[31:0] RD Figure 5-16 TCM subsystem that uses wait states for nonsequential accesses The address and chip-select inputs to the ROM are pipelined with respect to the ARM926EJ-S TCM interface outputs. An address incrementer is used to generate sequential addresses. The output of the incrementer is captured at the end of every cycle where the ROM CS chip select is active.
Tightly-Coupled Memory Interface T1 T2 T3 T4 T5 T6 T7 CLK IRCS IRSEQ IRWAIT IRADDR A A+1 A+2 A+3 A+4 CS A A A+1 A+2 A+3 RD I(A) I(A+1) I(A+2) IRRD I(A) I(A+1) I(A+2) A+4 I(A+3) I(A+3) Figure 5-17 Cycle timing of circuit that uses wait states for non sequential accesses 5.5.5 DMA interface example Figure 5-18 on page 5-27 shows an example TCM subsystem using the DMA interface. The signal driving DRDMAEN is connected to both the DRDMAEN and DRDMACS inputs.
Tightly-Coupled Memory Interface DMA ARM926EJ-S DRDMAADDR[17:0] DRDMAEN DRDMACS DMAADDR[31:0] DRDMAEN DMAWD[31:0] DMAnRW DMAWBL[3:0] DMARD[31:0] RD[31:0] DRRD[31:0] 1 DRWBL[3:0] 0 1 DRnRW 0 1 DRWD[31:0] DRADDR[17:0] DRCS DRWAIT 0 WBL[3:0] nRW WD[31:0] A[17:0] CS SRAM DRSEQ Figure 5-18 TCM subsystem that uses the DMA interface 5.5.6 Integrating RAM test logic The memory used to implement TCM might require some form of test access, typically by a BIST controller.
Tightly-Coupled Memory Interface BISTRSTn HRESETn BIST ARM926EJ-S DRDMAADDR[17:0] DRDMAEN DRDMACS BISTADDR[17:0] BISTEN BISTCS BISTWD[31:0] BISTnRW BISTWBL[3:0] BISTRD[31:0] RD[31:0] 1 DRWBL[3:0] 0 1 DRnRW 0 1 DRWD[31:0] DRADDR[17:0] DRCS DRWAIT 0 WBL[3:0] nRW WD[31:0] A[17:0] CS SRAM DRSEQ Figure 5-19 TCM test access using BIST This is similar to the previous DMA example.
Tightly-Coupled Memory Interface 5.6 TCM access penalties The data side of the ARM926EJ-S core can access the ITCM. To maximize the performance of the ITCM, data read accesses to the ITCM are pipelined. The ARM926EJ-S core is stalled for two cycles to enable the pipeline read to complete. This is the only ARM926EJ-S TCM interface stall scenario. The inclusion of a write buffer in the TCM controller has eliminated all other sources of potential stalling for zero wait state TCM.
Tightly-Coupled Memory Interface 5.7 TCM write buffer Each TCM interface has a two word entry write buffer. This is required to de-pipeline the address and data values produced by the ARM9EJ-S core so that non-speculative writes can be made to memory with SRAM characteristics peformed without introducing stall cycles. The ARM9EJ-S core read requests take priority over writes, and consequently TCM transactions can be out of order with respect to instruction execution.
Tightly-Coupled Memory Interface 5.8 Using synchronous SRAM as TCM memory If you use SRAM to implement TCM memory, then your library RAM must meet the following requirements: • It must be synchronous. All timings must be relative to the rising clock edge. • It must have a chip select (RAM enable). • The RAM outputs must always be valid. They must not be tristated. • Byte write control is required.
Tightly-Coupled Memory Interface 5.9 TCM clock gating If the ARM926EJ-S processor is not currently running code from a TCM region, the idle signal for that TCM (DRIDLE for DTCM, IRIDLE for ITCM) is asserted. This indicates that a TCM access will not be performed in that cycle, enabling you to stop the TCM clock. If no clock stopping is required, you can ignore the idle signals. You can also use the idle signal to disable power to the RAMs if you require more stringent power control.
Chapter 6 Bus Interface Unit This chapter describes the ARM926EJ-S Bus Interface Unit (BIU). It contains the following sections: • About the bus interface unit on page 6-2 • Supported AHB transfers on page 6-3. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Bus Interface Unit 6.1 About the bus interface unit The ARM926EJ-S Bus Interface Unit (BIU) arbitrates and schedules AHB requests. The BIU contains separate masters for both instruction and data access enabling complete AHB system flexibility. Separate masters enable multi-layer AHB (see the Multi-layer AHB Overview) and multi-AHB systems to be implemented, giving the benefit of increased overall bus bandwidth and a more flexible system architecture.
Bus Interface Unit 6.2 Supported AHB transfers The ARM926EJ-S processor supports a subset of AHB transfers. The permitted AHB transfers are described in: • Memory map • Transfer size • Mapping of level one and level two (AHB) attributes on page 6-5 • Byte and halfword accesses on page 6-6 • AHB system considerations on page 6-6 • AHB clocking on page 6-10. 6.2.1 Memory map The ARM926EJ-S processor is a cached processor with two AHB interfaces.
Bus Interface Unit Table 6-1 shows the HBURST encodings that the ARM926EJ-S processor uses, and the operations that perform each burst size.
Bus Interface Unit 6.2.3 Mapping of level one and level two (AHB) attributes Table 6-2 shows the IHPROT[3:0] and DHPROT[3:0] mappings for memory operations.
Bus Interface Unit 6.2.4 Byte and halfword accesses This section describes byte and halfword accesses for: • Address alignment • Thumb instruction fetches • Endianness and byte lane indication. Address alignment The ARM926EJ-S BIU performs address alignment checking and aligns AHB addresses to the necessary boundary. 16-bit accesses are aligned to halfword boundaries, and 32-bit accesses to word boundaries.
Bus Interface Unit • Memory coherency on page 6-9. Single-layer AHB systems If the ARM926EJ-S processor is to be used in a single-layer AHB system, each of the two BIU masters must be treated as being unique. The simplest way of integrating the two ARM926EJ-S bus masters into a single-layer AHB system is for each master to be a separate requestor into the AHB arbiter, the same as for any multi-master system. The data master normally has higher arbitration priority than the instruction master.
Bus Interface Unit Interconnect matrix Decode DMA master Slave #1 Mux Slave #2 Mux Slave #3 Mux Slave #4 Input stage Decode I-side master Mux Input stage ARM926EJ-S processor Decode D-side master Input stage Figure 6-1 Multi-layer AHB system example Multi-layer AHB is described in more detail in the Multi-layer AHB Overview.
Bus Interface Unit DHCLKEN D-AHB D-AHB subsystem ARM926EJ-S processor IHCLKEN I-AHB D-AHB to I-AHB bridge I-AHB subsystem Figure 6-2 Multi-AHB system example If both AHB systems operate at the same frequency, DHCLKEN and IHCLKEN must be tied together. See AHB clocking on page 6-10 for more details. The AHB clock for each system, HCLK1 and HCLK2, must be synchronized to the ARM926EJ-S clock signal CLK.
Bus Interface Unit 6.2.6 AHB clocking The ARM926EJ-S design uses a single clock, CLK. To run the ARM926EJ-S processor at a higher frequency than the AHB system bus, a separate AHB clock enable for each of the two bus masters is required (in a multi-AHB system each AHB system can be running at a different frequency): DHCLKEN Is used to signify the rising edge of HCLK for the system data BIU bus master. IHCLKEN Is used to signify the rising edge of HCLK for the system instruction BIU bus master.
Bus Interface Unit For all other types of access (cache linefills, writeback evictions, buffered writes), an Error response is ignored. If the ARM926EJ-S processor is to be used in a system which has to be tolerant to soft errors in external memory, then both soft error detection and correction must be done in hardware at the time the AHB transfer is made. The DHREADY and IHREADY signals can be used to extend the transfer until corrected data is available. ARM DDI0198D Copyright © 2001-2003 ARM Limited.
Bus Interface Unit 6-12 Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 7 Noncachable Instruction Fetches This chapter describes noncachable instruction fetches in the ARM926EJ-S processor. It contains the following section: • About noncachable instruction fetches on page 7-2. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Noncachable Instruction Fetches 7.1 About noncachable instruction fetches The ARM926EJ-S processor performs speculative noncachable instruction fetches to increase performance. Speculative instruction fetching is enabled at reset. This can be disabled using bit 16 in the debug state register CP15 c15 (see Test and Debug Register c15 on page 2-36). If prefetching is disabled only instruction fetches issued directly by the ARM9EJ-S core result in instruction fetches on the AHB interface.
Noncachable Instruction Fetches This IMB implementation only applies to the ARM926EJ-S processor running code from a noncachable region of memory. If code is run from a cachable region of memory, or a different device is used then a different IMB implementation is required. IMBs are described in Chapter 9 Instruction Memory Barrier. 7.1.3 AHB behavior If instruction prefetching is disabled, all instruction fetches appear on the AHB interface as single, nonsequential fetches.
Noncachable Instruction Fetches 7-4 Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 8 Coprocessor Interface This chapter describes the ARM926EJ-S coprocessor interface. It contains the following sections: • About the ARM926EJ-S external coprocessor interface on page 8-2 • LDC/STC on page 8-4 • MCR/MRC on page 8-6 • CDP on page 8-8 • Privileged instructions on page 8-9 • Busy-waiting and interrupts on page 8-10 • CPBURST on page 8-11 • CPABORT on page 8-12 • nCPINSTRVALID on page 8-13. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Coprocessor Interface 8.1 About the ARM926EJ-S external coprocessor interface The ARM926EJ-S supports the connection of on-chip coprocessors to the ARM9EJ-S core through an external coprocessor interface. All types of coprocessor instructions are supported. 8.1.1 Overview Coprocessors determine the instructions that they have to execute by using a pipeline follower in the coprocessor. As each instruction arrives from memory it enters both the ARM9EJ-S pipeline and the coprocessor pipeline.
Coprocessor Interface This is one technique for generating a clock that reflects the ARM9EJ-S core pipeline advancing. If CPCLKEN is LOW on the rising edge of CPCLK then the ARM9EJ-S core pipeline is stalled and the coprocessor pipeline should not advance. Coprocessor instructions There are three classes of coprocessor instructions: LDC or STC Load coprocessor register from memory or store coprocessor register to memory.
Coprocessor Interface 8.2 LDC/STC The cycle timing for this operation is shown in Figure 8-3. Coprocessor pipeline Decode Fetch Execute (GO) Execute (GO) GO GO Execute (GO) Execute (LAST) Memory Write CLK CPINSTR[31:0] LDC nCPMREQ CPPASS CPLATECANCEL CHSDE[1:0] CHSEX[1:0] GO LAST Ignored CPDOUT[31:0] LDC CPDIN[31:0] STC Figure 8-3 LDC/STC cycle timing In Figure 8-3 four words of data are transferred.
Coprocessor Interface If a coprocessor instruction busy-waits then CPPASS is asserted on every cycle until the coprocessor instruction is executed. If an interrupt occurs during busy-waiting then CPPASS is driven LOW and the coprocessor should stop the coprocessor instruction execution. Another output, CPLATECANCEL is used to cancel a coprocessor instruction when the instruction preceding it caused a Data Abort.
Coprocessor Interface 8.3 MCR/MRC These cycles look very similar to STC/LDC. An example with a busy-wait state is shown in Figure 8-4.
Coprocessor Interface 8.3.1 Interlocked MCR If the data for an MCR operation is not available inside the ARM9EJ-S core pipeline during its first Decode cycle, then the ARM9EJ-S core pipeline interlocks for one or more cycles until the data is available. An example of this is where the register being transferred is the destination from a preceding LDR instruction.
Coprocessor Interface 8.4 CDP CDP instructions usually execute in a single cycle. Like all the previous cycles, nCPMREQ is driven LOW to signal when an instruction is entering the Decode and then the Execute stage of the pipeline. If the instruction is to be executed then the CPPASS signal is driven HIGH during Execute. If the coprocessor can execute the instruction immediately it drives CHSDE[1:0] with LAST.
Coprocessor Interface 8.5 Privileged instructions The coprocessor might restrict certain instructions for use in privileged modes only. To do this, the coprocessor has to track the nCPTRANS output. Figure 8-7 shows how nCPTRANS changes after a mode change.
Coprocessor Interface 8.6 Busy-waiting and interrupts The coprocessor is permitted to stall (busy-wait) the processor during the execution of a coprocessor instruction if, for example, it is still busy with an earlier coprocessor instruction. To do so, the coprocessor associated with the Decode stage instruction drives WAIT on CHSDE[1:0].
Coprocessor Interface 8.7 CPBURST The CPBURST signal is used by the external coprocessor to indicate the number of words to be transferred in an LDC or STC operation. CPBURST is used by the ARM926EJ-S memory system to optimize LDC/STC instructions that access either noncachable or nonbufferable regions of memory. The encoding of CPBURST is shown in Table 8-2.
Coprocessor Interface 8.8 CPABORT The CPABORT signal being asserted HIGH indicates that an LDC/STC instruction has aborted. CPABORT is asserted in the cycle after the Memory stage of the aborting LDC/STC instruction. This is shown in Figure 8-9.
Coprocessor Interface 8.9 nCPINSTRVALID The nCPINSTRVALID signal indicates if the instruction currently on the CPINSTR bus is valid, and should be decoded by the coprocessor. If nCPINSTRVALID is 1, then the instruction should not be decoded by the coprocessor and an ABSENT response should be made for all corresponding Decode cycles for this instruction. nCPINSTRVALID is the equivalent of the CPTBIT signal in the ARM946E-S and ARM966E-S processors. ARM DDI0198D Copyright © 2001-2003 ARM Limited.
Coprocessor Interface 8.10 Connecting multiple external coprocessors If multiple coprocessors are connected to the ARM926EJ-S processor, then outputs of the various coprocessors must be combined to form a single set of coprocessor inputs. The coprocessor handshake signals are combined together by ANDing the top bit and ORing the bottom bit. This enables a coprocessor to produce a fixed response of b10 (Absent), when it is inactive.
Chapter 9 Instruction Memory Barrier This chapter describes the ARM926EJ-S Instruction Memory Barrier (IMB) operation. It contains the following sections: • About the instruction memory barrier operation on page 9-2 • IMB operation on page 9-3 • Example IMB sequences on page 9-5. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Instruction Memory Barrier 9.1 About the instruction memory barrier operation Whenever code is treated as data, for example self-modifying code, or loading code into memory, then a sequence of instructions called an Instruction Memory Barrier (IMB) operation must be used to ensure consistency between the data and instruction streams processed by the ARM926EJ-S processor.
Instruction Memory Barrier 9.2 IMB operation To ensure consistency between data and instruction sides, you must take the following steps: 1. Clean the DCache 2. Drain the write buffer 3. Synchronize data and instruction streams in level two AHB subsystems 4. Invalidate the ICache on page 9-4 5. Flush the prefetch buffer on page 9-4. 9.2.1 Clean the DCache If the cache contains cache lines corresponding to write-back regions of memory, then it might contain dirty entries.
Instruction Memory Barrier 9.2.4 Invalidate the ICache The ICache must be invalidated to remove any stale copies of instructions that are no longer valid. If the ICache is not being used, or the modified regions are not in cachable areas of memory, then this might not be required. 9.2.5 Flush the prefetch buffer To ensure consistency, the prefetch buffer should be flushed before self-modifying code is executed. See Self modifying code on page 7-2. 9-4 Copyright © 2001-2003 ARM Limited.
Instruction Memory Barrier 9.
Instruction Memory Barrier 9-6 Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 10 Embedded Trace Macrocell Support This chapter describes the Embedded Trace Macrocell (ETM) support for the ARM926EJ-S processor. It contains the following section: • About Embedded Trace Macrocell support on page 10-2. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Embedded Trace Macrocell Support 10.1 About Embedded Trace Macrocell support To support real-time trace, the ARM926EJ-S processor provides an interface to enable connection of an Embedded Trace Macrocell (ETM). For more information on the ETM, see the ETM9 Technical Reference Manual. The ETM consists of two parts: Trace port A trace protocol has been developed to provide a real-time trace capability for processor cores that are deeply embedded in larger ASIC designs.
Embedded Trace Macrocell Support Note Stalling the core with FIFOFULL affects real-time operating performance. If connected, an ETM must be disabled during normal ARM926EJ-S processor operation to prevent FIFOFULL adversely affecting the ARM926EJ-S processor performance. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Embedded Trace Macrocell Support 10-4 Copyright © 2001-2003 ARM Limited. All rights reserved.
Chapter 11 Debug Support This chapter describes the debug support for the ARM926EJ-S processor. It contains the following section: • ARM DDI0198D About debug support on page 11-2. Copyright © 2001-2003 ARM Limited. All rights reserved.
Debug Support 11.1 About debug support Debug support is implemented by using the ARM9EJ-S core embedded within the ARM926EJ-S processor. Full details of the debug support provided by the ARM9EJ-S core are described in the ARM9EJ-S Technical Reference Manual. Debug support for the ARM926EJ-S memory system is implemented by extending the debug facilities providing access to CP15 using an ARM9EJ-S external scan chain (scan chain 15).
Debug Support To perform an access using scan chain 15, you must: 1. During the SHIFT-DR state of the TAP state machine, shift in the read/write bit, register address, and register data value for writing, with bit 32 set to 1. For read operations the data value field does not have to be written. 2. Move through UPDATE-DR. The operation specified by the register address and write not read bits does not start. 3.
Debug Support The mapping of the register address field to the CP15 registers is shown in Table 11-2.
Chapter 12 Power Management This chapter describes the power management facilities provided by the ARM926EJ-S processor. It contains the following section: • About power management on page 12-2. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Power Management 12.1 About power management The power management facilities provided by the ARM926EJ-S processor are: • Dynamic power management (wait for interrupt mode) • Static power management (leakage control) on page 12-3. 12.1.
Power Management When the ARM926EJ-S has entered a low-power state, all of the main internal clocks are stopped, including the clock for the ARM9EJ-S core. However, the ARM9EJ-S is active if DBGTCKEN is asserted. This enables values to be written in the ARM9EJ-S debug control register so that a debugger can force an exit from wait for interrupt mode. This means that you can safely stop the ARM926EJ-S CLK if STANDBYWFI is HIGH and DBGTCKEN is LOW.
Power Management MMU RAMs The RAM used to implement the MMU can be safely powered down if the MMU has been disabled (using CP15 control register c1) and it contains no valid entries.While the MMU is disabled, only explicit CP15 operations can cause the MMU RAM to be accessed (c8 TLB maintenance operations, and c15 MMU test/debug operations). These instructions must not be executed while the MMU RAM is powered down.The MMU RAM must be powered up prior to re-enabling the MMU.
Appendix A Signal Descriptions This appendix describes the ARM926EJ-S processor input and output signals. It contains the following sections: • Signal properties and requirements on page A-2 • AHB related signals on page A-3 • Coprocessor interface signals on page A-5 • Debug signals on page A-7 • JTAG signals on page A-9 • Miscellaneous signals on page A-10 • ETM interface signals on page A-12 • TCM interface signals on page A-14. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
Signal Descriptions A.1 Signal properties and requirements To ensure ease of integration of the ARM926EJ-S processor into embedded applications, and to simplify synthesis flow, the following design techniques have been used: • a single rising edge clock times all activity • all signals and buses are unidirectional • all inputs are required to be synchronous to the single clock.
Signal Descriptions A.2 AHB related signals Table A-1 describes the ARM926EJ-S processor AHB related signals. Table A-1 AHB related signals ARM DDI0198D Signal name Direction Description DHADDR[31:0] Output AHB address (data). DHBL[3:0] Output Byte lane indicator for current transfer. DHBURST[2:0] Output AHB burst size (data). DHBUSREQ Output AHB bus request (data). DHCLKEN Input Signifies the rising edge of HCLK for the data AHB.
Signal Descriptions Table A-1 AHB related signals (continued) A-4 Signal name Direction Description IHGRANT Input AHB bus grant signal (instruction). IHLOCK Output AHB bus lock signal (instruction). IHPROT[3:0] Output AHB bus access information (instruction). IHREADY Input AHB transfer complete signal (instruction). IHRDATA[31:0] Input AHB read data (instruction). IHRESP[1:0] Input AHB transfer response (instruction).
Signal Descriptions A.3 Coprocessor interface signals Table A-2 describes the ARM926EJ-S processor coprocessor interface signals. Table A-2 Coprocessor interface signals Name Direction Description CPABORT Output Indicates STC/LDC operation aborted. Asserted in WB stage of coprocessor pipeline. CPBURST[3:0] Output Indicates number of words to be transferred for LDC/STC operation. If no external coprocessors are attached, this must be tied to b0000.
Signal Descriptions Table A-2 Coprocessor interface signals (continued) A-6 Name Direction Description CHSEX[1:0] Coprocessor handshake execute Input The handshake signals from the Execute stage of the coprocessors pipeline follower. Indicates ABSENT (10), WAIT (00), GO (01), or LAST (11). If no external coprocessors are attached these must be tied to b10 (ABSENT response). nCPINSTRVALID Coprocessor valid instruction Output Valid instruction indicator for CPINSTR (replaces CPTBIT).
Signal Descriptions A.4 Debug signals Table A-3 describes the ARM926EJ-S processor debug signals. Table A-3 Debug signals ARM DDI0198D Name Direction Description COMMRX Communications channel receive Output When HIGH, this signal denotes that the comms channel receive buffer contains valid data waiting to be read. COMMTX Communications channel transmit Output When HIGH, this signal denotes that the comms channel transmit buffer is empty.
Signal Descriptions Table A-3 Debug signals (continued) A-8 Name Direction Description DBGRNG[1:0] EmbeddedICE-RT range out Output Indicates that the corresponding EmbeddedICE-RT watchpoint register has matched the conditions currently present on the address, data, and control buses. This signal is independent of the state of the watchpoint enable control bit. DBGRQI Internal debug request Output Represents the debug request signal that is presented to the core debug logic.
Signal Descriptions A.5 JTAG signals Table A-4 describes the ARM926EJ-S processor JTAG signals. Table A-4 JTAG signals ARM DDI0198D Name Direction Description DBGIR[3:0] TAP controller instruction register Output These four bits reflect the current instruction loaded into the TAP controller instruction register. These bits change when the TAP controller is in the UPDATE-IR state. DBGnTRST Not test reset Input This is the active LOW reset signal for the EmbeddedICE-RT internal state.
Signal Descriptions A.6 Miscellaneous signals Table A-5 describes the miscellaneous signals on the ARM926EJ-S processor. Table A-5 Miscellaneous signals A-10 Name Direction Description BIGENDINIT Input Determines the setting of the B bit in CP15 c1 after a system reset. When HIGH the reset state of the B bit is 1 (big-endian). When LOW the reset state of the B bit is 0 (little-endian). CLK Input This clock times all operations of the ARM926EJ-S design.
Signal Descriptions Table A-5 Miscellaneous signals (continued) ARM DDI0198D Name Direction Description TAPID[31:0] Input This is the ARM926EJ-S device identification (ID) code test data register, accessible from the scan chains. It must be tied to 0x07926F0F for an ARM926EJ-S processor when the device is instantiated. TESTMODE Input Test mode test signal. This signal must be LOW during normal operation.
Signal Descriptions A.7 ETM interface signals Table A-6 describes the ARM926EJ-S processor ETM interface signals. Table A-6 ETM interface signals A-12 Name Direction Description ETMBIGEND Output ETM big-endian configuration indication. ETMCHSD[1:0] Output ETM coprocessor handshake decode signals. ETMCHSE[1:0] Output ETM coprocessor handshake execute signals. ETMDA[31:0] Output ETM data address. ETMDABORT Output ETM data abort. ETMDBGACK Output ETM debug mode indication.
Signal Descriptions Table A-6 ETM interface signals (continued) ARM DDI0198D Name Direction Description ETMITBIT Output ETM Thumb state indication. ETMLATECANCEL Output ETM coprocessor late cancel indication. ETMnWAIT Output ETM clock stall signal. ETMPASS Output ETM coprocessor instruction execute indication. ETMPROCID[31:0] Output ETM process identifier. ETMPROCIDWR Output ETMPROCID write strobe. ETMRDATA[31:0] Output ETM read data.
Signal Descriptions A.8 TCM interface signals Table A-7 describes the ARM926EJ-S TCM interface signals. Table A-7 TCM interface signals A-14 Signal Direction Function DRADDR[17:0] Output Data TCM address. This is the word address for the access. Valid during request cycles. DRCS Output Chip select. Indicates if an access will take place in the following cycle. Not valid during wait cycles. DRDMAADDR[17:0] Input Direct memory access address for DTCM memory.
Signal Descriptions Table A-7 TCM interface signals (continued) ARM DDI0198D Signal Direction Function DRSIZE[3:0] Input Data TCM size. Static configuration input that specifies the physical size of TCM memories attached. 0000 = absent 0011 = 4KB 0100 = 8KB … 1010 = 512KB 1011 = 1MB Values 0001, 0010, and 1100 to 1111 are reserved. DRWAIT Input Data TCM wait state input. If HIGH, the DTCM cannot service the request in that cycle. Valid in request cycle and subsequent wait cycles.
Signal Descriptions Table A-7 TCM interface signals (continued) A-16 Signal Direction Function IRDMAADR[17:0] Input DMA access cycle. If asserted, IRADDR is directly sourced from IRDMAADDR, and IRCS is the result of logically ORing IRDMACS with the chip select value for the current TCM access. IRDMAEN Input Enables direct memory access to the ITCM memory using the IRDMAADDR and IRDMACS inputs. IRDMACS Input Direct memory access chip-select for ITCM.
Signal Descriptions Table A-7 TCM interface signals (continued) ARM DDI0198D Signal Direction Function IRWAIT Input Instruction TCM wait state input. If HIGH, the ITCM cannot service the request in that cycle. Valid in request cycle and subsequent wait cycles. Ignored if not a request or wait cycle. IRWBL[3:0] Output Instruction TCM write data byte lane indicator. Valid during request cycles.
Signal Descriptions A-18 Copyright © 2001-2003 ARM Limited. All rights reserved.
Appendix B CP15 Test and Debug Registers This appendix describes the ARM926EJ-S CP15 Test and Debug Registers. It contains the following section: • About the Test and Debug Registers on page B-2. ARM DDI0198D Copyright © 2001-2003 ARM Limited. All rights reserved.
CP15 Test and Debug Registers B.1 About the Test and Debug Registers The ARM926EJ-S Test and Debug Registers, CP15 c15, provide additional device-specific test operations. You can use the registers to access and control the following: • Debug Override Register • Debug and Test Address Register on page B-4 • Trace Control Register on page B-5 • MMU test operations on page B-5 • Cache Debug Control Register on page B-12 • MMU Debug Control Register on page B-13 • Memory Region Remap Register on page B-15.
CP15 Test and Debug Registers The reset state of the Debug Override Register is 0x0.
CP15 Test and Debug Registers Bit 15, disable block-level clock gating You can use this bit to disable block-level clock gating with the ARM926EJ-S processor. This bit does not affect the functionality of the ARM926EJ-S processor. It allows the benefits of block-level clock gating to be evaluated without the requirement to build two different implementations of the ARM926EJ-S macrocell, one with block-level clock gating, one without.
CP15 Test and Debug Registers B.1.3 Trace Control Register You can access the Trace Control Register by using the following instructions: MCR p15, 1, , c15, c1, 0 MRC p15, 1, , c15, c1, 0 ; Write Trace Control Register ; Read Trace Control Register You can use the Trace Control Register to determine under what conditions the ARM9EJ-S core is stalled when the FIFOFULL signal is asserted.
CP15 Test and Debug Registers Table B-3 MMU test operation instructions (continued) Instruction Operation MRC p15, 4/5, , c15, c4, 0 MCR p15, 4/5, , c15, c5, 0 Read PA and access permission data in main TLB entry Write PA and access permission data data in main TLB entry MCR p15, 4/5, , c15, c7, 0 Transfer main TLB entry into RAM MRC P15, 4/5, , c15, c2, 1 MCR P15, 4/5, , c15, c3, 1 Read tag in lockdown TLB entry Write tag in lockdown TLB entry MRC P15, 4/5, , c15, c4, 1 MCR
CP15 Test and Debug Registers Table B-4 Encoding of the main TLB entry-select bit fields 2. Bit Name Definition [30:15] - Should Be Zero. [14:10] Indexed entry Indexed entry in main TLB. [9:0] - Should Be Zero. Use the following MMU test operation instructions to access the MVA tag: MRC p15, 4/5, , c15, c2, 0 ; read tag in main TLB MCR p15, 4/5, , c15, c3, 0 ; write tag in main TLB The Rd register contains the read or write data as Figure B-3 shows.
CP15 Test and Debug Registers MCR p15, 4/5, , c15, c5, 0 ; write PA and access permission data The Rd register contains the read or write data as shown in Figure B-4. 10 9 8 7 31 PA SBZ 4 3 2 1 0 Domain select AP C B [1:0] Figure B-4 Rd format for accessing PA and AP data of main or lockdown TLB entry Table B-6 describes the PA and access permission bit fields in the Rd register. Table B-6 Encoding of the TLB entry PA and AP bit fields 4. Bit Name Definition [31:10] PA Physical address.
CP15 Test and Debug Registers To read an entry from the 2-way main TLB, the entry must first be written using the above instructions. The entry can then be read using the following instructions: MRC p15, 4/5, , c15, c2, 0 ; read tag main TLB MRC p15, 4/5, , c15, c4, 0 ; read PA/PROT main TLB The data RAM attached to the main MMU is 112 bits wide. The mapping into the data RAM for main TLB writes for the TAG is shown below and would appear on MMUxWD[111:0] as shown in Table B-7.
CP15 Test and Debug Registers CLK MMUxCS MMUxADDR IDX MMUxWE LOC MMUxWD WDATA MMUxRD RDATA MMUxOE Figure B-5 Write to the data RAM Note On the rising clock edge when MMUxCS=1, the data on MMUxWD is written into the data RAM. The exact index is on MMUxADDR (as specified in the Test and Debug Address Register). The lanes written are controlled by the MMUxWE[3:0] pins.
CP15 Test and Debug Registers 31 29 28 SBZ 26 25 0 Indexed entry SBZ Figure B-6 Rd format for selecting lockdown TLB entry Table B-8 describes the entry-select bit fields in the Rd register. Table B-8 Encoding of the lockdown TLB entry-select bit fields 2.
CP15 Test and Debug Registers The data to be written or read is placed in ARM register Rd with the format shown in Figure B-4 on page B-8. B.1.5 Cache Debug Control Register The Cache Debug Control Register is used to force specific cache behavior required for debug.
CP15 Test and Debug Registers Forcing write-through behavior Setting the DWB bit to 1 forces the DCache to treat all cachable accesses as though they were in a write-through region of memory. The setting of the DWB bit overrides any setting specified in either the MMU page tables or in the Memory Region Remap Register.
CP15 Test and Debug Registers You can access the MMU Debug Control Register using the following instructions: MRC{cond} p15,7,,c15,c1,0 ; read MMU debug control register MCR{cond} p15,7,,c15,c1,0 ; write MMU debug control register The MMU Debug Control Register format is shown in Figure B-8. 31 8 7 6 5 4 3 2 1 0 SBZ DMTMI DMTMD DMTLI DMTLD DIUTM DDUTM DIUTL DDUTL Figure B-8 MMU Debug Control Register format The MMU Debug Control Register bit assignments are given in Table B-10.
CP15 Test and Debug Registers Table B-10 MMU Debug Control Register bit assignments (continued) B.1.
CP15 Test and Debug Registers Table B-12 describes the bit fields of the Memory Region Remap Register.
CP15 Test and Debug Registers Figure B-10 shows the flow and precedence of CP15 c15 control bits in resolving the cachable and bufferable attributes of a memory reference.
CP15 Test and Debug Registers B-18 Copyright © 2001-2003 ARM Limited. All rights reserved.
Glossary This glossary describes some of the terms used in this manual. Where terms can have several meanings, the meaning presented here is intended. Abort A mechanism that indicates to a core that it must halt execution of an attempted illegal memory access. An abort can be caused by the external or internal memory system as a result of attempting to access invalid instruction or data memory. An abort is classified as either a Prefetch or Data Abort, and an internal or External Abort.
Glossary Advanced High-performance Bus (AHB) The AMBA Advanced High-performance Bus system connects embedded processors such as an ARM core to high-performance peripherals, DMA controllers, on-chip memory, and interfaces. It is a high-speed, high-bandwidth bus that supports multi-master bus management to maximize system performance. See also Advanced Microcontroller Bus Architecture and AHB-Lite.
Glossary Architecture The organization of hardware and/or software that characterizes a processor and its attached components, and enables devices with similar characteristics to be grouped together when describing their behavior, for example, Harvard architecture, instruction set architecture, ARMv6 architecture. ARM instruction Is a word that specifies an operation for an ARM processor to perform. ARM instructions must be word-aligned.
Glossary Big-endian memory Memory in which: - a byte or halfword at a word-aligned address is the most significant byte or halfword within the word at that address - a byte at a halfword-aligned address is the most significant byte within the halfword at that address. See also Little-endian memory. Block address An address that comprises a tag, an index, and a word field. The tag bits identify the way that contains the matching cache entry for a cache hit.
Glossary Cache A block of on-chip or off-chip fast access memory locations, situated between the processor and main memory, used for storing and retrieving copies of often used instructions and/or data. This is done to greatly reduce the average speed of memory accesses and so to increase processor performance. See also Cache terminology diagram on the last page of this glossary.
Glossary Clean A cache line that has not been modified while it is in the cache is said to be clean. To clean a cache is to write dirty cache entries into main memory. If a cache line is clean, it is not written on a cache miss because the next level of memory contains the same data as the cache. See also Dirty. Clock gating Gating a clock signal for a macrocell with a control signal and using the modified clock that results to control the operating state of the macrocell.
Glossary CAM includes comparison logic with each bit of storage. A data value is broadcast to all words of storage and compared with the values there. Words that match are flagged in some way. Subsequent operations can then work on flagged words. It is possible to read the flagged words out one at a time or write to certain bit positions in all of them. Context The environment that each process operates in for a multitasking operating system.
Glossary Data Abort An indication from a memory system to a core that it must halt execution of an attempted illegal memory access. A Data Abort is attempting to access invalid data memory. See also Abort, External Abort, and Prefetch Abort. Data cache A block of on-chip fast access memory locations, situated between the processor and main memory, used for storing and retrieving copies of often used data.
Glossary Domain A collection of sections, large pages and small pages of memory, which can have their access permissions switched rapidly by writing to the Domain Access Control Register (CP15 register c3). Do Not Modify (DNM) In Do Not Modify fields, the value must not be altered by software. DNM fields read as Unpredictable values, and must only be written with the same value read from the same field on the same processor.
Glossary Exception A fault or error event that is considered serious enough to require that program execution is interrupted. Examples include attempting to perform an invalid memory access, external interrupts, and undefined instructions. When an exception occurs, normal program flow is interrupted and execution is resumed at the corresponding exception vector. This contains the first instruction of the interrupt handler to deal with the exception. Exception service routine See Interrupt handler.
Glossary Fully-associative cache A cache that has just one cache set that consists of the entire cache. The number of cache entries is the same as the number of cache ways. See also Direct-mapped cache. Half-rate clocking (ETM) Dividing the trace clock by two so that the TPA can sample trace data signals on both the rising and falling edges of the trace clock. The primary purpose of half-rate clocking is to reduce the signal transition rate on the trace clock of an ASIC for very high-speed systems.
Glossary Index See Cache index. Index register A register specified in some load or store instructions. The value of this register is used as an offset to be added to or subtracted from the base register value to form the virtual address, which is sent to memory. Some addressing modes optionally enable the index register value to be shifted prior to the addition or subtraction.
Glossary Little-endian memory Memory in which: - a byte or halfword at a word-aligned address is the least significant byte or halfword within the word at that address - a byte at a halfword-aligned address is the least significant byte within the halfword at that address. See also Big-endian memory. Load/store architecture A processor architecture where data-processing operations only operate on register contents, not directly on memory contents.
Glossary Modified Virtual Address (MVA) A Virtual Address produced by the ARM processor can be changed by the current Process ID to provide a Modified Virtual Address (MVA) for the MMUs and caches. See also Fast Context Switch Extension. Monitor debug-mode One of two mutually exclusive debug modes. In Monitor debug-mode the processor enables a software abort handler provided by the debug monitor or operating system debug task.
Glossary Processor A processor is the circuitry in a computer system required to process data using the computer instructions. It is an abbreviation of microprocessor. A clock source, power supplies, and main memory are also required to create a minimum complete working computer system. Physical Address (PA) The MMU performs a translation on Modified Virtual Addresses (MVA) to produce the Physical Address (PA) which is given to AHB to perform an external access.
Glossary Scan chain A scan chain is made up of serially-connected devices that implement boundary scan technology using a standard JTAG TAP interface. Each device contains at least one TAP controller containing shift registers that form the chain connected between TDI and TDO, through which test data is shifted. Processors can contain several shift registers to enable you to access selected parts of the device. SCREG The currently selected scan chain number in an ARM TAP controller. Set See Cache set.
Glossary TAP See Test access port. TCM See Tightly coupled memory. Test Access Port (TAP) The collection of four mandatory and one optional terminals that form the input/output and control interface to a JTAG boundary-scan architecture. The mandatory terminals are TDI, TDO, TMS, and TCK. The optional terminal is TRST. This signal is mandatory in ARM cores because it is used to reset the debug logic.
Glossary Unpredictable For reads, the data returned when reading from this location is unpredictable. It can have any value. For writes, writing to this location causes unpredictable behavior, or an unpredictable change in device configuration. Unpredictable instructions must not halt or hang the processor, or any part of the system. VA See Virtual Address. Victim A cache line, selected to be discarded to make room for a replacement cache line that is required as a result of a cache miss.
Glossary Write completion The memory system indicates to the processor that a write has been completed at a point in the transaction where the memory system is able to guarantee that the effect of the write is visible to all processors in the system. This is not the case if the write is associated with a memory synchronization primitive, or is to a Device or Strongly Ordered region.
Glossary Cache terminology diagram The diagram below illustrates the following cache terminology: • block address • cache line • cache set • cache way • index • tag. Block address Tag Index Word Byte Cache set Cache way Line number 0 1 2 Tag 3 Tag 4 Tag Tag 5 6 7 n 2 1 0 Cache tag RAM Word number m Cache line 2 1 0 3 Cache data RAM = Hit (way number) Glossary-20 Read data (way that corresponds) Copyright © 2001-2003 ARM Limited. All rights reserved.
Index The items in this index are listed in alphabetical order. The references given are to page numbers.
Index Cleaning DCache 9-3 Clock gating 5-32 Coarse page table descriptor 3-11 Context ID register 2-35 Control register 2-12 Conventions numerical xx signal naming xix timing diagram xviii typographical xviii Coprocessor clocking 8-2 instructions 8-3 interface 8-2 interface signals A-5 CPABORT 8-12 CPBURST 8-11 CPU aborts 3-21 CP15 accessing registers 2-4 MRC and MCR bit pattern 2-4 registers 2-3 test registers B-2 Ctype encoding 2-9 field 2-9 D DCache enable/disable 2-14 size 2-9 Debug clocks 11-2 overri
Index N Level one descriptor 3-8 descriptor, accessing 3-8 fetch 3-8 Level two descriptor 3-14 Line length encoding 2-11 L4 bit 2-13 nCPINSTRVALID 8-13 Noncachable code 7-2 Noncachable instruction fetches 7-2 Numerical conventions xx O M M bit 2-10, 2-14 MCR, accessing CP15 2-4 MCR/MRC instructions 8-6 Memory coherency 6-9 Memory management unit (MMU) 3-2 Memory Region Remap Register B-15 Miscellaneous signals A-10 MMU accessible registers 3-4 accessing main TLB entries B-6 accessing MVA tag B-5, B-7 a
Index Stall cycles 5-29, 5-30 Status field 2-19 Subpages 3-20 Synchronizing data and instruction streams 9-3 System control coprocessor registers 2-3 System protection 2-14 U T V TCM access priorities 4-8 optimizing for power 5-22 optimizing for speed 5-23 region register 2-26 region register, using 5-19 status register 2-7, 2-12 TCM interface examples 5-20 signals A-14 TCM status register 2-7 Test and clean DCache 2-21 operations 2-24 Test and debug register 2-36 Test registers B-2 Test, clean, and in