Cortex -R4 and Cortex-R4F ™ Revision: r1p3 Technical Reference Manual Copyright © 2009 ARM Limited. All rights reserved.
Cortex-R4 and Cortex-R4F Technical Reference Manual Copyright © 2009 ARM Limited. All rights reserved. Release Information The following changes have been made to this book.
Contents Cortex-R4 and Cortex-R4F Technical Reference Manual Preface About this book ........................................................................................................ xvii Feedback .................................................................................................................. xxi Chapter 1 Introduction 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 Chapter 2 Programmer’s Model 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 ARM DDI 0363E ID013010 About the processor .....
Contents 2.10 2.11 Chapter 3 Processor Initialization, Resets, and Clocking 3.1 3.2 3.3 3.4 Chapter 4 About the L2 interface .............................................................................................. 9-2 AXI master interface ................................................................................................ 9-3 AXI master interface transfers ................................................................................. 9-7 AXI slave interface ........................
Contents 11.4 11.5 11.6 11.7 11.8 11.9 11.10 11.11 11.12 Chapter 12 About cycle timings and interlock behavior ............................................................ 14-3 Register interlock examples ................................................................................... 14-6 Data processing instructions .................................................................................. 14-7 QADD, QDADD, QSUB, and QDSUB instructions ................................................
Contents A.7 A.8 A.9 A.10 A.11 A.12 A.13 Appendix B A-16 A-17 A-19 A-20 A-21 A-22 A-23 ECC Schemes B.1 Appendix C Dual core interface signals .................................................................................... Debug interface signals ......................................................................................... ETM interface signals ............................................................................................ Test signals .....................................
List of Tables Cortex-R4 and Cortex-R4F Technical Reference Manual Table 1-1 Table 1-2 Table 1-3 Table 2-1 Table 2-2 Table 2-3 Table 2-4 Table 2-5 Table 2-6 Table 2-7 Table 3-1 Table 4-1 Table 4-2 Table 4-3 Table 4-4 Table 4-5 Table 4-6 Table 4-7 Table 4-8 Table 4-9 Table 4-10 Table 4-11 Table 4-12 Table 4-13 Table 4-14 Table 4-15 ARM DDI 0363E ID013010 Change History ...............................................................................................................................
List of Tables Table 4-16 Table 4-17 Table 4-18 Table 4-19 Table 4-20 Table 4-21 Table 4-22 Table 4-23 Table 4-24 Table 4-25 Table 4-26 Table 4-27 Table 4-28 Table 4-29 Table 4-30 Table 4-31 Table 4-32 Table 4-33 Table 4-34 Table 4-35 Table 4-36 Table 4-37 Table 4-38 Table 4-39 Table 4-40 Table 4-41 Table 4-42 Table 4-43 Table 4-44 Table 4-45 Table 4-46 Table 4-47 Table 4-48 Table 4-49 Table 4-50 Table 4-51 Table 4-52 Table 4-53 Table 4-54 Table 4-55 Table 6-1 Table 6-2 Table 6-3 Table 6-4 Table 6-5 Table
List of Tables Table 8-6 Table 8-7 Table 8-8 Table 8-9 Table 8-10 Table 8-11 Table 8-13 Table 8-14 Table 8-12 Table 8-15 Table 8-16 Table 8-17 Table 9-1 Table 9-2 Table 9-3 Table 9-4 Table 9-5 Table 9-6 Table 9-7 Table 9-8 Table 9-9 Table 9-10 Table 9-11 Table 9-12 Table 9-13 Table 9-14 Table 9-15 Table 9-16 Table 9-17 Table 9-18 Table 9-19 Table 9-20 Table 9-21 Table 9-22 Table 9-23 Table 9-24 Table 9-25 Table 9-26 Table 9-27 Table 9-28 Table 9-29 Table 9-30 Table 9-32 Table 9-31 Table 9-33 Table 9-34 Tab
List of Tables Table 11-6 Table 11-7 Table 11-8 Table 11-9 Table 11-10 Table 11-11 Table 11-12 Table 11-13 Table 11-14 Table 11-15 Table 11-16 Table 11-17 Table 11-18 Table 11-19 Table 11-20 Table 11-21 Table 11-22 Table 11-23 Table 11-24 Table 11-25 Table 11-26 Table 11-27 Table 11-28 Table 11-29 Table 11-30 Table 11-31 Table 11-32 Table 11-33 Table 11-34 Table 11-35 Table 11-36 Table 11-37 Table 11-38 Table 11-39 Table 11-40 Table 11-41 Table 11-42 Table 11-43 Table 11-44 Table 11-45 Table 12-1 Table 12-
List of Tables Table 14-5 Table 14-6 Table 14-7 Table 14-8 Table 14-9 Table 14-10 Table 14-11 Table 14-12 Table 14-13 Table 14-14 Table 14-15 Table 14-16 Table 14-17 Table 14-18 Table 14-19 Table 14-20 Table 14-21 Table 14-22 Table 14-23 Table 14-24 Table 14-25 Table 14-26 Table 14-27 Table 14-28 Table 15-1 Table 15-2 Table 15-3 Table 15-4 Table 15-5 Table 15-6 Table 15-7 Table 15-8 Table 15-9 Table 15-10 Table 15-11 Table 15-12 Table 15-13 Table 15-14 Table 15-15 Table 15-16 Table 15-17 Table 15-18 Table
List of Tables Table A-18 Table C-1 Table C-2 ARM DDI 0363E ID013010 FPU signals ............................................................................................................................... A-23 Differences between issue B and issue C .................................................................................. C-1 Differences between issue C and issue D .................................................................................. C-3 Copyright © 2009 ARM Limited.
List of Figures Cortex-R4 and Cortex-R4F Technical Reference Manual Figure 1-1 Figure 1-2 Figure 1-3 Figure 1-4 Figure 2-1 Figure 2-2 Figure 2-3 Figure 2-4 Figure 2-5 Figure 3-1 Figure 3-2 Figure 4-1 Figure 4-2 Figure 4-3 Figure 4-4 Figure 4-5 Figure 4-6 Figure 4-7 Figure 4-8 Figure 4-9 Figure 4-10 Figure 4-11 Figure 4-12 Figure 4-13 Figure 4-14 Figure 4-15 ARM DDI 0363E ID013010 Key to timing diagram conventions ............................................................................................
List of Figures Figure 4-16 Figure 4-17 Figure 4-18 Figure 4-19 Figure 4-20 Figure 4-21 Figure 4-22 Figure 4-23 Figure 4-24 Figure 4-25 Figure 4-26 Figure 4-27 Figure 4-28 Figure 4-29 Figure 4-30 Figure 4-31 Figure 4-32 Figure 4-33 Figure 4-34 Figure 4-35 Figure 4-36 Figure 4-37 Figure 4-38 Figure 4-39 Figure 4-40 Figure 4-41 Figure 4-42 Figure 4-43 Figure 4-44 Figure 4-45 Figure 4-46 Figure 4-47 Figure 4-48 Figure 4-49 Figure 4-50 Figure 4-51 Figure 4-52 Figure 4-53 Figure 4-54 Figure 4-55 Figure 4-56 Fig
List of Figures Figure 11-3 Figure 11-4 Figure 11-5 Figure 11-6 Figure 11-7 Figure 11-8 Figure 11-9 Figure 11-10 Figure 11-11 Figure 11-12 Figure 11-13 Figure 11-14 Figure 11-15 Figure 11-16 Figure 11-17 Figure 11-18 Figure 11-19 Figure 12-1 Figure 12-2 Figure 12-3 Figure 12-4 Figure 12-5 Figure 12-6 Figure 13-1 Figure 13-2 Figure 13-3 Figure 13-4 ARM DDI 0363E ID013010 Debug ROM Address Register format ....................................................................................
Preface This preface introduces the Cortex-R4 and Cortex-R4F Technical Reference Manual. It contains the following sections: • About this book on page xvii • Feedback on page xxi. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Preface About this book This is the Technical Reference Manual (TRM) for the Cortex-R4 and Cortex-R4F processors. In this book the generic term processor means both the Cortex-R4 and Cortex-R4F processors. Any differences between the two processors are described where necessary. Note The Cortex-R4F processor is a Cortex-R4 processor that includes the optional Floating Point Unit (FPU) extension, see Product revision information on page 1-24 for more information.
Preface Read this for a description of the Memory Protection Unit (MPU) and the access permissions process. Chapter 8 Level One Memory System Read this for a description of the Level One (L1) memory system. Chapter 10 Power Control Read this for a description of the power control facilities. Chapter 11 Debug Read this for a description of the debug support. Chapter 12 FPU Programmer’s Model Read this for a description of the Floating Point Unit (FPU) support in the Cortex-R4F processor.
Preface monospace Denotes text that you can enter at the keyboard, such as commands, file and program names, and source code. monospace Denotes a permitted abbreviation for a command or option. You can enter the underlined text instead of the full command or option name. monospace italic Denotes arguments to monospace text where the argument is to be replaced by a specific value. monospace bold Denotes language keywords when used outside example code.
Preface Prefix R Denotes AXI read data channel signals. Prefix W Denotes AXI write data channel signals. Further reading This section lists publications by ARM and by third parties. See http://infocenter.arm.com for access to ARM documentation. ARM publications This book contains information that is specific to the processor.
Preface Feedback ARM welcomes feedback on this product and its documentation. Feedback on this product If you have any comments or suggestions about this product, contact your supplier and give: • The product name. • The product revision or version. • An explanation with as much information as you can provide. Include symptoms if appropriate. Feedback on this book If you have any comments on this book, send an e-mail to errata@arm.com.
Chapter 1 Introduction This chapter introduces the processor and its features.
Introduction 1.1 About the processor The processor is a mid-range CPU for use in deeply-embedded systems. The features of the processor include: ARM DDI 0363E ID013010 • An integer unit with integral EmbeddedICE-RT logic. • High-speed Advanced Microprocessor Bus Architecture (AMBA) Advanced eXtensible Interfaces (AXI) for Level two (L2) master and slave interfaces. • Dynamic branch prediction with a global history buffer, and a 4-entry return stack. • Low interrupt latency.
Introduction 1.2 About the architecture The processor implements the ARMv7-R architecture and ARMv7 debug architecture. In addition, the Cortex-R4F processor implements the VFPv3-D16 architecture. This includes the VFPv3 instruction set. The ARMv7-R architecture provides 32-bit ARM and 16-bit and 32-bit Thumb instruction sets, including a range of Single Instruction, Multiple-Data (SIMD) Digital Signal Processing (DSP) instructions that operate on 16-bit or 8-bit data values in 32-bit registers.
Introduction 1.3 Components of the processor This section describes the main components of the processor: • Data Processing Unit on page 1-5 • Load/store unit on page 1-5 • Prefetch unit on page 1-5 • L1 memory system on page 1-5 • L2 AXI interfaces on page 1-7 • Debug on page 1-8 • System control coprocessor on page 1-9 • Interrupt handling on page 1-9. Figure 1-1 shows the structure of the processor.
Introduction 1.3.1 Data Processing Unit The DPU holds most of the program-visible state of the processor, such as general-purpose registers, status registers and control registers. It decodes and executes instructions, operating on data held in the registers in accordance with the ARM Architecture. Instructions are fed to the DPU from the PFU through a buffer. The DPU performs instructions that require data to be transferred to or from the memory system by interfacing to the LSU.
Introduction Instruction and data caches You can configure the processor to include separate instruction and data caches. The caches have the following features: • Support for independent configuration of the instruction and data cache sizes between 4KB and 64KB. • Pseudo-random cache replacement policy. • 8-word cache line length. Cache lines can be either write-back or write-through, determined by MPU region. • Ability to disable each cache independently.
Introduction • • • • • 512KB 1MB 2MB 4MB 8MB. The TCMs are external to the processor. This provides flexibility in optimizing the TCM subsystem for performance, power, and RAM type. The INITRAMA and INITRAMB pins enable booting from the ATCM or BTCM, respectively. Both the ATCM and BTCM support wait states. For more information, see Chapter 8 Level One Memory System.
Introduction 1.3.6 Debug The processor has a CoreSight compliant Advanced Peripheral Bus version 3 (APBv3) debug interface. This permits system access to debug resources, for example, the setting of watchpoints and breakpoints. The processor provides extensive support for real-time debug and performance profiling. The following sections give an overview of debug: • System performance monitoring • ETM interface • Real-time debug facilities.
Introduction The EmbeddedICE-RT logic supports two modes of debug operation: Halt mode On a debug event, such as a breakpoint or watchpoint, the debug logic stops the processor and forces it into debug state. This enables you to examine the internal state of the processor, and the external state of the system, independently from other system activity. When the debugging process completes, the processor and system state are restored, and normal program execution resumes.
Introduction RFE CPS ARM DDI 0363E ID013010 Return from exception using data from the stack. Change processor state, such as interrupt mask setting and clearing, and mode changes. Copyright © 2009 ARM Limited. All rights reserved.
Introduction 1.4 External interfaces of the processor The processor has the following interfaces for external access: • APB Debug interface • ETM interface • Test interface. For more information on these interfaces and how they are integrated into the system, see the AMBA 3 APB Protocol Specification and the CoreSight Architecture Specification. 1.4.1 APB Debug interface AMBA APBv3 is used for debugging purposes. CoreSight is the ARM architecture for multi-processor trace and debug.
Introduction 1.5 Power management The processor includes several microarchitectural features to reduce energy consumption: • Accurate branch and return prediction, reducing the number of incorrect instruction fetch and decode operations. • The caches use sequential access information to reduce the number of accesses to the tag RAMs and to unmatched data RAMs. • Extensive use of gated clocks and gates to disable inputs to unused functional blocks.
Introduction 1.6 Configurable options Table 1-1 shows the features of the processor that can be configured using either build-configuration or pin-configuration. See Product documentation, design flow, and architecture on page 1-21 for information about configuration of the processor. Many of these features, if included, can also be enabled and disabled during software configuration.
Introduction Table 1-1 Configurable options (continued) Feature Options Sub-options Build-configuration or pin-configuration BTCM No BTCM ports - Build and pin One BTCM port (B0TCM) No error checking Parity error checking 32-bit ECC error checking 64-bit ECC error checking Build 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 2MB, 4MB, or 8MB Pin No error checking Parity error checking 32-bit ECC error checking 64-bit ECC error checking Build 2x2KB, 2x4KB, 2x8KB, 2x16KB, 2x32KB, 2x64KB
Introduction Table 1-1 Configurable options (continued) Feature Options Sub-options Build-configuration or pin-configuration BTCM at reset Disabled - Pin Enabledb Base address configured Base address 0x0 Pin and build Peripheral ID RevAnd field Any 4-bit value - Build AXI slave interface No AXI-slave - Build AXI-slave included - TCM Hard Error Cache No TCM Hard Error Cache - TCM Hard Error Cache included c - Non-Maskable FIQ Interrupt Disabled (FIQ can be masked by software - E
Introduction Table 1-2 Configurable options at reset (continued) Feature Options Register TCM external errors ATCM external error enable ATCMECEN BTCM external error enable, for B0TCM and B1TCM independently B0TCMECEN/ B1TCMECEN ATCM load/store-64 enableb ATCMRMW BTCM load/store-64 enableb BTCMRMW TCM load/store-64 (read-modify-write) behavior a. Can only be enabled if the appropriate TCM is configured with the appropriate error checking scheme, and the appropriate number of ports b.
Introduction 1.7 Execution pipeline stages The following stages make up the pipeline: • the Fetch stages • the Decode stages • an Issue stage • the three or four Execution stages. Figure 1-2 shows the Fetch and Decode pipeline stages of the processor and the pipeline operations that can take place at each stage.
Introduction Iss Ex1 Register read, address generation, and instruction issue Ex2 Wr Ret Load/store pipeline DC1 DC2 EX1 EX1 Wr F0 F1 F2 Data processing pipeline Fwr Floating point pipeline Mispredicted direct branches Exception flush and mispredicted indirect branches Figure 1-4 Cortex-R4F Issue and Execution pipeline stages The names of the common pipeline stages and their functions are: Iss Register read and instruction issue to execute stages. Ex Execute stages.
Introduction 1.8 Redundant core comparison The processor can be implemented with a second, redundant copy of most of the logic. This second core shares the input pins and the cache RAMs of the master core, so only one set of cache RAMs is required. The master core drives the output pins and the cache RAMs. Comparison logic can be included during implementation which compares the outputs of the redundant core with those of the master core.
Introduction 1.9 Test features The processor is delivered as fully-synthesizable RTL and is a fully-static design. Scan-chains and test wrappers for production test can be inserted into the design by the synthesis tools during implementation. See the relevant reference methodology documentation for more information. Production test of the processor cache and TCM RAMs can be done through the dedicated, pipelined MBIST interface.
Introduction 1.10 Product documentation, design flow, and architecture This section describes the content of the product documents, how they relate to the design flow, and the relevant architectural standards and protocols. Note See Further reading on page xx for more information about the documentation described in this section. 1.10.
Introduction 3. Programming. The system programmer develops the software required to configure and initialize the processor, and possibly tests the required application software on the processor. Each of these stages can be performed by a different company. Configuration options are available at each stage. These options affect the behavior and available features at the next stage: Build configuration The implementer chooses the options that affect how the RTL source files are pre-processed.
Introduction • The properties of memory accesses. • The debug architecture you can use to debug the processor. The TRM gives more information about the implemented debug features. The Cortex-R4 processor implements the ARMv7-R architecture profile. Advanced Microcontroller Bus Architecture protocol Advanced Microcontroller Bus Architecture (AMBA) is an open standard, on-chip bus specification that defines the interconnection and management of functional blocks that make up a System-on-Chip (SoC).
Introduction 1.11 Product revision information This manual is for major revision 1 of the processor. At the time of release, this includes the r1p0, r1p1, r1p2, and r1p3 releases, although the vast majority of the information in this document will also be applicable to any future r1px releases.
Introduction Table 1-3 shows the mappings between these various numbers, for all releases. Table 1-3 ID values for different product versions 1.11.
Chapter 2 Programmer’s Model This chapter describes the processor registers and provides an overview for programming the microprocessor.
Programmer’s Model 2.1 About the programmer’s model The processor implements the ARMv7-R architecture that provides: • the 32-bit ARM instruction set • the extended Thumb instruction set introduced in ARMv6T2, that uses Thumb-2 technology to provide a wide range of 32-bit instructions. For more information on the ARM and Thumb instruction sets, see the ARM Architecture Reference Manual.
Programmer’s Model 2.2 Instruction set states The processor has two instruction set states: ARM state The processor executes 32-bit, word-aligned ARM instructions in this state. Thumb state The processor executes 32-bit and 16-bit halfword-aligned Thumb instructions in this state. Note Transition between ARM state and Thumb state does not affect the processor mode or the register contents. 2.2.
Programmer’s Model 2.3 Operating modes In each state there are seven modes of operation: • User (USR) mode is the usual mode for the execution of ARM or Thumb programs. It is used for executing most application programs. • Fast interrupt (FIQ) mode is entered on taking a fast interrupt. • Interrupt (IRQ) mode is entered on taking a normal interrupt. • Supervisor (SVC) mode is a protected mode for the operating system and is entered on taking a Supervisor Call (SVC), formerly SWI.
Programmer’s Model 2.4 Data types The processor supports these data types: • doubleword, 64-bit • word, 32-bit • halfword, 16-bit • byte, 8-bit. • • Note When any of these types are described as unsigned, the N-bit data value represents a non-negative integer in the range 0 to +2N-1, using normal binary format. When any of these types are described as signed, the N-bit data value represents an integer in the range -2N-1 to +2N-1-1, using two’s complement format.
Programmer’s Model 2.5 Memory formats The processor views memory as a linear collection of bytes numbered in ascending order from zero. For example, bytes 0-3 hold the first stored word, and bytes 4-7 hold the second stored word. The processor can treat words of data in memory as being stored in either: • Byte-invariant big-endian format • Little-endian format. Additionally, the processor supports mixed-endian and unaligned data accesses. For more information, see the ARM Architecture Reference Manual.
Programmer’s Model 2.6 Registers The processor has a total of 37 program registers: • 31 general-purpose 32-bit registers • six 32-bit status registers. These registers are not all accessible at the same time. The processor state and operating mode determine the registers that are available to the programmer. 2.6.1 The register set In the processor the same register set is used in both the ARM and Thumb states. Sixteen general registers and one or two status registers are accessible at any time.
Programmer’s Model For more information, see the ARM Architecture Reference Manual. In Privileged modes, another register, the Saved Program Status Register (SPSR), is accessible. This contains the condition code flags, status bits, and current mode bits saved as a result of the exception that caused entry to the current mode. Banked registers have a mode identifier that indicates which mode they relate to. Table 2-1lists these identifiers.
Programmer’s Model General registers and program counter FIQ System and User Supervisor Abort IRQ Undefined R0 R0 R0 R0 R0 R0 R1 R1 R1 R1 R1 R1 R2 R2 R2 R2 R2 R2 R3 R3 R3 R3 R3 R3 R4 R4 R4 R4 R4 R4 R5 R5 R5 R5 R5 R5 R6 R6 R6 R6 R6 R6 R7 R7 R7 R7 R7 R7 R8 R8_fiq R8 R8 R8 R8 R9 R9_fiq R9 R9 R9 R9 R10 R10_fiq R10 R10 R10 R10 R11 R11_fiq R11 R11 R11 R11 R12 R12_fiq R12 R12 R12 R12 R13 R13_fiq R13_svc R13_abt R13_irq R13_un
Programmer’s Model 2.7 Program status registers The processor contains one CPSR and five SPSRs for exception handlers to use. The program status registers: • hold information about the most recently performed ALU operation • control the enabling and disabling of interrupts • set the processor operating mode. Figure 2-4 shows the bit arrangement in the status registers.
Programmer’s Model • • • • • • MRRC2 PLD RFE SETEND SRS STC2. In Thumb state, the processor can only execute the Branch instruction conditionally. Other instructions can be made conditional by placing them in the If-Then (IT) block. For more information about conditional execution in Thumb state, see the ARM Architecture Reference Manual. 2.7.
Programmer’s Model For more information on the operation of the IT execution state bits, see the ARM Architecture Reference Manual. 2.7.4 The J bit The J bit in the CPSR returns 0 when read. Note You cannot use an MSR to change the J bit in the CPSR. 2.7.5 The DNM bits Software must not modify the Do Not Modify (DNM) bits. These bits are: 2.7.6 • Readable, to preserve the state of the processor, for example, during process context switches. • Writable, to enable the processor to restore its state.
Programmer’s Model Note GE bit is 1 if A op B ≥ C, otherwise 0. The SEL instruction uses GE[3:0] to select which source register supplies each byte of its result. • • 2.7.7 Note For unsigned operations, the usual ARM rules determine the GE bits for carries out of unsigned additions and subtractions, and so are carry-out bits. For signed operations, the rules for setting the GE bits are chosen so that they have the same sort of greater than or equal functionality as for unsigned operations.
Programmer’s Model 2.7.11 The M bits M[4:0] are the mode bits. These bits determine the processor operating mode as Table 2-3 shows. Table 2-3 PSR mode bit values Visible state registers M[4:0] 2.7.
Programmer’s Model Bits in Figure 2-4 on page 2-10 that are in this category are A, I, F, and M[4:0]. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Programmer’s Model 2.8 Exceptions Exceptions are taken whenever the normal flow of a program must temporarily halt, for example, to service an interrupt from a peripheral. Before attempting to handle an exception, the processor preserves the critical parts of the current processor state so that the original program can resume when the handler routine has finished.
Programmer’s Model b. The return instruction you must use after an UNDEF exception has been handled depends on whether you want to retry the undefined instruction or not and, if so, on the size of the undefined instruction. Taking an exception When taking an exception the processor: 1. Preserves the address of the next instruction in the appropriate LR.
Programmer’s Model Because SVC handlers are always expected to return after the SVC instruction, the IT execution state bits are automatically advanced when an exception is taken prior to copying the CPSR into the SPSR. 2.8.2 Reset When the nRESET signal is driven LOW a reset occurs, and the processor abandons the executing instruction. When nRESET is driven HIGH again the processor: 1. Forces CPSR M[4:0] to b10011 (Supervisor mode) and sets the A, I, and F bits in the CPSR.
Programmer’s Model You can disable IRQ exceptions within a Privileged mode by setting the CPSR.I bit to b1. See Program status registers on page 2-10. IRQ interrupts are automatically disabled when an IRQ occurs, by setting the CPSR.I bit. You can use nested interrupts but it is up to you to save any corruptible registers and to re-enable IRQs by clearing the CPSR.I bit.
Programmer’s Model LIL behavior enables accesses to Normal memory, including multiword accesses and external accesses, to be abandoned part-way through execution so that the processor can react to a pending interrupt faster than would otherwise be the case. When an instruction is abandoned in this way, the processor behaves as if the instruction was not executed at all.
Programmer’s Model Start !VE || VIC handshake complete TRUE FALSE !((nFIQ||F) && (nIRQ||I)) VE==1 FALSE TRUE FALSE Start handshake with VIC TRUE !(nFIQ||F) SPSR_irq = CPSR FALSE LR_irq = RA+4 TRUE CPSR[4:0] = IRQ mode SPSR_fiq = CPSR CPSR[5] = TE LR_fiq = RA+4 CPSR[7] = 1 CPSR[4:0] = FIQ mode CPSR[5] = TE V==1 FALSE VE==1 CPSR[7] = 1, CPSR[6] = 1 TRUE FALSE V==1 TRUE TRUE FALSE PC[31:0] = 0xFFFF001C PC[31:0] = 0x0000001C FALSE Is VIC ready to provide handler address? TRUE PC[31:0] =
Programmer’s Model 2.8.4 Aborts When the processor's memory system cannot complete a memory access successfully, an abort is generated. Aborts can occur for a number of reasons, for example: • a permission fault indicated by the MPU • an error response to a transaction on the AXI memory bus • an error detected in the data by the ECC checking logic. An error occurring on an instruction fetch generates a prefetch abort. Errors occurring on data accesses generate data aborts.
Programmer’s Model Imprecise aborts An imprecise abort, also known as an asynchronous abort, is one for which the exception is taken on a later instruction to the instruction that generated the aborting memory access. The abort handler cannot determine which instruction generated the abort, or the state of the processor when the abort occurred. Therefore, imprecise aborts are normally fatal. Imprecise aborts can be generated by store instructions to normal-type or device-type memory.
Programmer’s Model • perform the appropriate data transfers on behalf of the aborted instruction and return to the instruction after the abandoned instruction • treat the error as fatal and terminate the process. If the abort handler returns to the abandoned instruction, some of the memory accesses generated are repeated. The effect is that multiword load/store instructions can access the same memory location twice.
Programmer’s Model 2.8.6 Undefined instruction When an instruction is encountered which is UNDEFINED, or is for the VFP when the VFP is not enabled, the processor takes the Undefined instruction exception. Software can use this mechanism to extend the ARM instruction set by emulating UNDEFINED coprocessor instructions. UNDEFINED exceptions also occur when a UDIV or SDIV instruction is executed, the value in Rm is zero, and the DZ bit in the System Control Register is set.
Programmer’s Model Note If the EmbeddedICE-RT logic is configured into Halt debug-mode, a breakpoint instruction causes the processor to enter debug state. See Halting debug-mode debugging on page 11-3. 2.8.8 Exception vectors You can configure the location of the exception vector addresses by setting the V bit in CP15 c1 System Control Register to enable HIVECS, as Table 2-5 shows.
Programmer’s Model 2.9 Acceleration of execution environments Because the ARMv7-R architecture requires Jazelle® software compatibility, three Jazelle registers are implemented in the processor. Table 2-7 shows the Jazelle register instruction summary and the response to the instructions.
Programmer’s Model 2.10 Unaligned and mixed-endian data access support The processor supports unaligned memory accesses. Unaligned memory accesses was introduced with ARMv6. Bit [22] of c1, Control Register is always 1. The processor supports byte-invariant big-endianness BE-8 and little-endianness LE. The processor does not support word-invariant big-endianness BE-32. Bit [7] of c1, Control Register is always 0.
Programmer’s Model 2.11 Big-endian instruction support The processor supports little-endian or big-endian instruction format, and is dependent on the setting of the CFGIE pin. This is reflected in bit [31] of the System Control Register. For more information, see c1, System Control Register on page 4-35. Note The facility to use big-endian or little-endian instruction format is an implementation option, and you can therefore remove it in specific implementations.
Chapter 3 Processor Initialization, Resets, and Clocking Before you can run application software on the processor, it must be reset and initialized, including loading the appropriate software-configuration. This chapter describes the signals for clocking and resetting the processor, and the steps that the software must take to initialize the processor after reset. It contains the following sections: • Initialization on page 3-2 • Resets on page 3-6 • Reset modes on page 3-7 • Clocking on page 3-9.
Processor Initialization, Resets, and Clocking 3.1 Initialization Most of the architectural registers in the processor, such as r0-r14, and s0-s31 and d0-d15 when floating-point is included, are not reset. Because of this, you must initialize these for all modes before they are used, using an immediate-MOV instruction, or a PC-relative load instruction. The Current Program Status Register (CPSR) is given a known value on reset. This is described in the ARM Architecture Reference Manual.
Processor Initialization, Resets, and Clocking • enable the FPU by setting the EN-bit in the FPEXC register, see Floating-Point Exception Register, FPEXC on page 12-7. Note Floating-point logic is only available with the Cortex-R4F processor. 3.1.4 Caches If the processor has been built with instruction or data caches, these must be invalidated before they are enabled, otherwise UNPREDICTABLE behavior can occur. See Cache operations on page 4-54.
Processor Initialization, Resets, and Clocking DMA into TCM The SoC includes a Direct Memory Access (DMA) device that reads data from a ROM, and writes it to the TCMs through the AXI slave interface. Write to TCM directly from debugger A Debug Access Port (DAP) in the system is used to generate AMBA transactions to write data into the TCMs through the AXI slave interface. This DAP is controlled from the debug host through a JTAG chain.
Processor Initialization, Resets, and Clocking • Turn on 64-bit store behavior using CP15. See c15, Secondary Auxiliary Control Register on page 4-41. • Write to the TCM using any store instructions, or any AXI write transactions. The processor performs read-modify-write accesses to ensure that all writes are to 64-bit aligned quantities, even though error checking is turned off. Note You can enable error checking and 64-bit store behavior on a per-TCM interface basis.
Processor Initialization, Resets, and Clocking 3.2 Resets The processor has the following reset inputs: nRESET This signal is the main processor reset that initializes the majority of the processor logic. PRESETDBGn This signal resets processor debug logic and CoreSight ETM-R4. nSYSPORESET This signal is the reset that initializes the entire processor, including CP14 debug logic and the APB debug logic. See CP14 registers reset on page 11-23 for information.
Processor Initialization, Resets, and Clocking 3.3 Reset modes The reset signals in the processor enable you to reset different parts of the design independently. Table 3-1 shows the reset signals, and the combinations and possible applications that you can use them in. Table 3-1 Reset modes Reset mode nRESET PRESETDBGn nSYSPORESET nCPUHALT Application Power-on reset 0 x 0 x Reset at power up, full system reset. Hard reset or cold reset.
Processor Initialization, Resets, and Clocking 3.3.2 Processor reset A processor or warm reset initializes the majority of the processor, excluding the EmbeddedICE-RT logic. Processor reset is typically used for resetting a system that has been operating for some time, for example, watchdog reset. Because the nRESET signal is synchronized within the processor, you do not have to synchronize this signal. 3.3.
Processor Initialization, Resets, and Clocking 3.4 Clocking The processor has two functional clock inputs. Externally to the processor, you must connect together CLKIN and FREECLKIN. In addition, there is the PCLKDBG clock for the debug APB bus. This is asynchronous to the main clock. All clocks can be stopped indefinitely without loss of state. Three additional clock inputs, CLKIN2, DUALCLKIN, and DUALCLKIN2, are related to the dual-redundant core functionality, if included.
Chapter 4 System Control Coprocessor This chapter describes the purpose of the system control coprocessor, its structure, operation, and how to use it. It contains the following sections: • About the system control coprocessor on page 4-2 • System control coprocessor registers on page 4-9. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
System Control Coprocessor 4.1 About the system control coprocessor This section gives an overview of the system control coprocessor. For more information of the registers in the system control coprocessor, see System control coprocessor registers on page 4-9. The purpose of the system control coprocessor, CP15, is to control and provide status information for the functions implemented in the processor.
System Control Coprocessor Table 4-1 System control coprocessor register functions Function Register/operation Reference to description System control and configuration Control c1, System Control Register on page 4-35 Auxiliary control Auxiliary Control Registers on page 4-38 Coprocessor Access Control c1, Coprocessor Access Register on page 4-44 Main IDa c0, Main ID Register on page 4-14 Product Feature IDs The Processor Feature Registers on page 4-18 c0, Debug Feature Register 0 on page 4-20
System Control Coprocessor Table 4-1 System control coprocessor register functions (continued) Function Register/operation Reference to description TCM control and configuration TCM Status c0, TCM Type Register on page 4-16 Region c9, BTCM Region Register on page 4-57 c9, TCM Selection Register on page 4-59 System performance monitoring Performance monitoring Chapter 6 Events and Performance Monitor Validation System validation Validation Registers on page 4-62 a.
System Control Coprocessor 4.1.3 MPU control and configuration The MPU control and configuration registers: • control program access to memory • designate areas of memory as either: — Normal, Non-cacheable — Normal, Cacheable — Device — Strongly Ordered. • detect MPU faults and external aborts. The MPU control and configuration registers consist of one read-only register and eleven read/write registers. Figure 4-2 shows the arrangement of registers in this functional group.
System Control Coprocessor CRn c0 c7 c15 Opcode_1 CRm Opcode_2 0 1 c0 c0 2 0 0 c0 † c5 1 0 1 0 † 0 Read-only Cache Type Register Current Cache Size Identification Register Current Cache Level Identification Register Cache Size Selection Register Cache Operations Registers ‡ Invalidate all Data Cache Register Read/write Write-only † See description of cache operations for implemented CRm and Opcode_2 values Accessible in User mode ‡ See description of cache operations for operations with User
System Control Coprocessor CRn Opcode_1 CRm Opcode_2 0 c12 0 c13 0 c14 0 1 2 3 4 5 0 1 2 0 1 2 c9 Read/write Read-only Performance Monitor Control Register † Count Enable Set Register † Count Enable Clear Register † Overflow Flag Status Register † Software Increment Register † Performance Counter Selection Register † Cycle Count Register † Event Select Register † Performance Count Register † User Enable Register Interrupt Enable Set Register Interrupt Enable Clear Register Write-only Accessi
System Control Coprocessor You can only change the cache size to a size supported by the cache RAMs implemented in your design. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
System Control Coprocessor 4.2 System control coprocessor registers This section describes all of the registers in the system control coprocessor. The section presents a summary of the registers and descriptions in register order of CRn, Opcode_1, CRm, Opcode_2. For more information on using the system control coprocessor and the general method of how to access CP15 registers, see the ARM Architecture Reference Manual. 4.2.
System Control Coprocessor Table 4-2 Summary of CP15 registers and operations (continued) CRn Op1 1 c1 CRm Op2 Register or operation Type Reset value Page c8-c15 0-7 Undefined - - - c0 0 Current Cache Size ID Read-only -cd page 4-32 1 Current Cache Level ID Read-only 0x09000003c page 4-34 2-7 Undefined - - - c1-c15 0-7 2 c0 0 Cache Size Selection Read/write Unpredictable page 4-35 0 c0 0 System Control Read/write -d page 4-35 1 Auxiliary Control Read/write
System Control Coprocessor Table 4-2 Summary of CP15 registers and operations (continued) CRn c7 c7 Op1 0 0 CRm Register or operation Type Reset value Page 1-7 Undefined - - - c3-c15 1-7 c0 0-3 Undefined - - - 4 NOP, previously Wait For Interrupt Write-only - page 4-54 5-7 Undefined - - - c1-c4 0-7 c5 0 Invalidate entire instruction cache Write-only - page 4-55 c5 1 Invalidate instruction cache line by address to Point-of-Unification.
System Control Coprocessor Table 4-2 Summary of CP15 registers and operations (continued) CRn Op1 CRm Op2 Register or operation Type Reset value Page c7 0 c11 1 Clean data cache line by physical address to Point-of-Unification Write-only - page 4-55 2-7 Undefined - - - 1 Clean and invalidate data cache line by physical address to Point-of-Unification Write-only - page 4-55 2 Clean and invalidate data cache line by Set/Way Write-only - page 4-55 3-7 Undefined - - - c12-c1
System Control Coprocessor Table 4-2 Summary of CP15 registers and operations (continued) CRn Op1 CRm Op2 Register or operation Type Reset value Page c14 0 User Enable Read/write 0x00000000 page 6-15 1 Interrupt Enable Set Read/write Unpredictable page 6-16 2 Interrupt Enable Clear Read/write Unpredictable page 6-17 3-7 Undefined - - - c14 c15 0-7 c10 0 c0-c15 0-7 Undefined - - - c11 0 c0 0 Slave Port Control Read/write 0x00000000 page 4-59 c0 1-7 Undefined
System Control Coprocessor Table 4-2 Summary of CP15 registers and operations (continued) CRn Op1 CRm c3 c15 0 Op2 Register or operation Type Reset value Page 1 Build Options 2 Read-only -d page 4-72 2-7 Undefined - - - 0 Correctable Fault Location Read/write Unpredictable page 4-70 1-7 Undefined - - - c4 0-7 c5 0 Invalidate all data cache Write-only - page 4-55 1-7 Undefined - - - c6-c13 0-7 c14 0 Cache Size Override Write-only - page 4-69 1-7 Undefined
System Control Coprocessor The contents of the Main ID Register depend on the specific implementation. Table 4-3 shows how the bit values correspond with the Main ID Register functions. Table 4-3 Main ID Register bit functions Bits Field Function [31:24] Implementer Indicates implementer. 0x41 - ARM Limited. [23:20] Variant Identifies the major revision of the processor. This is the major revision number n in the rn part of the rnpn description of the product revision status.
System Control Coprocessor Table 4-4 shows how the bit values correspond with the Cache Type Register functions. Table 4-4 Cache Type Register bit functions Bits Field Function [31:28] - Always b1000. [27:24] CWG Cache Write-back Granule 0x0 = no information provided. See maximum cache line size in c0, Current Cache Size Identification Register on page 4-32. [23:20] ERG Exclusives Reservation Granule 0x0 = no information provided.
System Control Coprocessor Table 4-5 TCM Type Register bit functions (continued) Bits Field Function [18:16] BTCM Specifies the number of BTCMs implemented. This is always set to b001 because the processor has one BTCM. [15:3] Reserved SBZ. [2:0] ATCM Specifies the number of ATCMs implemented. Always set to b001. The processor has one ATCM.
System Control Coprocessor MRC p15, 0, , c0, c0, 4 ; Returns MPU details 4.2.6 c0, Multiprocessor ID Register The Multiprocessor ID Register enables cores to be recognized and characterized within a multiprocessor system. The Multiprocessor ID Register is: • read-only register • accessible in Privileged mode only. Figure 4-11 shows the arrangement of bits in the register.
System Control Coprocessor Table 4-7 shows how the bit values correspond with the Processor Feature Register 0 functions. Table 4-7 Processor Feature Register 0 bit functions Bits Field Function [31:16] Reserved SBZ. [15:12] State3 Indicates support for Thumb Execution Environment (ThumbEE). 0x0, no support. [11:8] State2 [7:4] State1 Indicates support for acceleration of execution environments in hardware or software.
System Control Coprocessor Table 4-8 Processor Feature Register 1 bit functions (continued) Bits Field Function [11:8] Microcontroller programmer’s model Indicates support for Microcontroller programmer’s model: 0x0, no support. [7:4] Security extension Indicates support for Security Extensions Architecture: 0x0, no support. [3:0] ARMv4 Programmer’s model Indicates support for standard ARMv4 programmer’s model: 0x1, the processor supports the ARMv4 model.
System Control Coprocessor Table 4-9 Debug Feature Register 0 bit functions (continued) Bits Field Function [11:8] Core debug model memory mapped Indicates the type of embedded processor debug model that the processor supports: 0x4, ARMv7 based model - memory mapped. [7:4] Secure debug model Indicates the type of secure debug model that the processor supports: 0x0, no support.
System Control Coprocessor 31 28 27 Reserved 24 23 20 19 FCSE 16 15 12 11 8 7 TCM 4 3 PMSA 0 VMSA Auxiliary Control Register Outer shareable Cache coherence Figure 4-15 Memory Model Feature Register 0 format Table 4-10 shows how the bit values correspond with the Memory Model Feature Register 0 functions. Table 4-10 Memory Model Feature Register 0 bit functions Bits Field Function [31:28] Reserved SBZ. [27:24] FCSE Indicates support for Fast Context Switch Extension (FCSE).
System Control Coprocessor 31 28 27 24 23 20 19 16 15 12 11 8 7 4 3 0 Branch predictor L1 test clean operations L1 cache maintenance operations (unified) L1 cache maintenance operations (Harvard) L1 cache line maintenance operations - Set and Way (unified) L1 cache line maintenance operations - Set and Way (Harvard) L1 cache line maintenance operations - MVA (unified) L1 cache line maintenance operations - MVA (Harvard) Figure 4-16 Memory Model Feature Register 1 format Table 4-11 shows how the
System Control Coprocessor c0, Memory Model Feature Register 2, MMFR2 The Memory Model Feature Register 2 provides information about the memory model, memory management, and cache support operations of the processor. The Memory Model Feature Register 2 is: • a read-only register • accessible in Privileged mode only. Figure 4-17 shows the bit arrangement for Memory Model Feature Register 2.
System Control Coprocessor To access the Memory Model Feature Register 2 read CP15 with: MRC p15, 0, , c0, c1, 6 ; Read Memory Model Feature Register 2. c0, Memory Model Feature Register 3, MMFR3 The Memory Model Feature Register 3 provides information about the two cache line maintenance operations for the processor. The Memory Model Feature Register 3 is: • a read-only register • accessible in Privileged mode only. Figure 4-18 shows the bit arrangement for Memory Model Feature Register 3.
System Control Coprocessor 4.2.11 Instruction Set Attributes Registers There are eight Instruction Set Attributes Registers, ISAR0 to ISAR7, but three of these are currently unused.
System Control Coprocessor Table 4-14 Instruction Set Attributes Register 0 bit functions (continued) Bits Field Function [11:8] Bitfield instructions Indicates support for bitfield instructions. 0x1, the processor supports bitfield instructions, BFC, BFI, SBFX, and UBFX. [7:4] Bit counting instructions [3:0] Atomic instructions Indicates support for bit counting instructions. 0x1, the processor supports CLZ. Indicates support for atomic load and store instructions.
System Control Coprocessor Table 4-15 shows how the bit values correspond with the Instruction Set Attributes Register 1 functions. Table 4-15 Instruction Set Attributes Register 1 bit functions Bits Field Function [31:28] Jazelle instructions 0x1, the processor supports: Indicates support for Jazelle instructions. BXJ instruction • • J bit in PSRs. For more information see Program status registers on page 2-10 and Acceleration of execution environments on page 2-27.
System Control Coprocessor Figure 4-21 shows the bit arrangement for Instruction Set Attributes Register 2.
System Control Coprocessor c0, Instruction Set Attributes Register 3, ISAR3 The Instruction Set Attributes Register 3 provides information about the instruction set that the processor supports beyond the basic set. The Instruction Set Attributes Register 3 is: • a read-only registers • accessible in Privileged mode only. Figure 4-22 shows the bit arrangement for Instruction Set Attributes Register 3.
System Control Coprocessor Table 4-17 Instruction Set Attributes Register 3 bit functions (continued) Bits Field Function [11:8] SVC instructions Indicates support for SVC (formerly SWI) instructions. 0x1, the processor supports SVC. [7:4] SIMD instructions Indicates support for Single Instruction Multiple Data (SIMD) instructions.
System Control Coprocessor Table 4-18 Instruction Set Attributes Register 4 bit functions (continued) Bits Field Function [15:12] SMC instructions Indicates support for Secure Monitor Call (SMC) (formerly SMI) instructions. 0x0, no support. [11:8] Write-back instructions Indicates support for write-back instructions. 0x1, supports all the writeback addressing modes defined in ARMv7. [7:4] With shift instructions Indicates support for with-shift instructions.
System Control Coprocessor 31 30 29 28 27 13 12 W W R W T B A A NumSets 2 0 Line Size Associativity Figure 4-24 Current Cache Size Identification Register format Table 4-19 shows how the bit values correspond with the Current Cache Size Identification Register.
System Control Coprocessor To access the Current Cache Size Identification Register read CP15 with: MRC p15, 1, , c0, c0, 0 ; Read Current Cache Size Identification Register 4.2.13 c0, Current Cache Level ID Register The Current Cache Level ID Register indicates the cache levels that are implemented. Architecturally, there can be a different number of cache levels on the instruction and data side. The register also captures the point-of-coherency and the point-of-unification.
System Control Coprocessor 4.2.14 c0, Cache Size Selection Register The Cache Size Selection Register holds the value that the processor uses to select the Current Cache Size Identification Register to use. The Cache Size Selection Register is: • a read/write register • accessible in Privileged mode only. Figure 4-26 shows the bit arrangement for the Cache Size Selection Register.
System Control Coprocessor 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 SBO 1 V I 7 6 SBZ 3 2 1 0 SBO C A M IE TE AFE TRE NMFI SBZ EE VE FI Z RR SBZ SBO BR SBO DZ SBZ Figure 4-27 System Control Register format Table 4-23 shows the purposes of the individual bits in the System Control Register. Table 4-23 System Control Register bit functions Bits Field Function [31] IE Identifies little or big instruction endianness in use: 0 = little-endianness 1 = big-endianness.
System Control Coprocessor Table 4-23 System Control Register bit functions (continued) Bits Field Function [19] DZ Divide by zero: 0 = do not generate an Undefined instruction exception 1 = generate an Undefined instruction exception. The reset value of this bit is 0. [18] Reserved SBO. [17] BR MPU background region enable. [16] Reserved SBO. [15] Reserved SBZ.
System Control Coprocessor Attempts to read or write the System Control Register from User mode results in an Undefined exception. 4.2.16 Auxiliary Control Registers The Auxiliary Control Registers control: • branch prediction • performance features • error and parity logic. c1, Auxiliary Control Register The Auxiliary Control Register is: • a read/write register • accessible in Privileged mode only. Figure 4-28 shows the arrangement of bits in the register.
System Control Coprocessor Table 4-24 Auxiliary Control Register bit functions (continued) Bits Field Function [28] DIADIa Case A dual issue control: 0 = Enabled. This is the reset value. 1 = Disabled. [27] B1TCMPCEN B1TCM parity or ECC check enable: 0 = Disabled 1 = Enabled. The primary input PARECCENRAM[2]b defines the reset value. If the BTCM is configured with ECC, you must always set this bit to the same value as B0TCMPCEN.
System Control Coprocessor Table 4-24 Auxiliary Control Register bit functions (continued) Bits Field Function [17] RSDIS Return stack disable: 0 = Normal return stack operation. This is the reset value. 1 = Return stack disabled. [16:15] BP This field controls the branch prediction policy: b00 = Normal operation. This is the reset value. b01 = Branch always taken. b10 = Branch always not taken. b11 = Reserved. Behavior is Unpredictable if this field is set to b11.
System Control Coprocessor Table 4-24 Auxiliary Control Register bit functions (continued) Bits Field Function [2] B1TCMECEN B1TCM external error enable: 0 = Disabled 1 = Enabled. The primary input ERRENRAM[2] defines the reset value. [1] B0TCMECEN B0TCM external error enable: 0 = Disabled 1 = Enabled. The primary input ERRENRAM[1] defines the reset value. [0] ATCMECEN ATCM external error enable: 0 = Disabled 1 = Enabled. The primary input ERRENRAM[0] defines the reset value. a.
System Control Coprocessor 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 31 4 3 2 1 0 Reserved DCHE ATCMRMW BTCMRMW ATCMECC B0TCMECC DR2B DF6DI DF2DI DDI DOODPFP DOOFMACS Reserved IXC Reserved IDC DZC IOC UFC OFC Figure 4-29 Secondary Auxiliary Control Register format Table 4-25 shows how the bit values correspond with the Secondary Auxiliary Control Register functions. Table 4-25 Secondary Auxiliary Control Register bit functions Bits Field Function [31:23] Reserved SBZ.
System Control Coprocessor Table 4-25 Secondary Auxiliary Control Register bit functions (continued) Bits Field Function [16] DOOFMACS Out-of-order FMACS control.c 0 = Enabled. This is the reset value. 1 = Disabled. [15:14] Reserved SBZ. [13] IXC Floating-point inexact exception output mask.c 0 = Mask floating-point inexact exception output. The output FPIXC is forced to zero. This is the reset value. 1 = Propagate floating point inexact exception flag FPSCR.IXC to output FPIXC.
System Control Coprocessor Table 4-25 Secondary Auxiliary Control Register bit functions (continued) Bits Field Function [2] ATCMECC Correction for internal ECC logic on ATCM port.d 0 = Enabled. This is the reset value. 1 = Disabled. [1] BTCMRMW Enables 64-bit stores for the BTCMs. When enabled, the processor uses read-modify-write to ensure that all reads and writes presented on the BTCM ports are 64 bits wide.e 0 = Disabled 1 = Enabled. The primary input RMWENRAM[1] defines the reset value.
System Control Coprocessor Table 4-26 shows how the bit values correspond with the Coprocessor Access Register functions. Table 4-26 Coprocessor Access Register bit functions Bits Field Function [31:28] Reserved SBZ. [27:0] cpa Defines access permissions for each coprocessor. Access denied is the reset condition, and is the behavior for non-existent coprocessors. b00 = Access denied. Attempts to access generates an Undefined exception.
System Control Coprocessor The Data Fault Status Register is: • a read/write register • accessible in Privileged mode only. Figure 4-31 shows the bit arrangement in the Data Fault Status Register. 13 12 11 10 9 8 7 31 S 0 0 Reserved 4 3 Domain 0 Status RW SD Figure 4-31 Data Fault Status Register format Table 4-28 shows how the bit values correspond with the Data Fault Status Register functions. Table 4-28 Data Fault Status Register bit functions Bits Field Function [31:13] Reserved SBZ.
System Control Coprocessor 4 3 13 12 11 10 9 8 7 31 Reserved S Domain 0 Status Reserved Reserved SD Figure 4-32 Instruction Fault Status Register format Table 4-29 shows how the bit values correspond with the Instruction Fault Status Register functions. Table 4-29 Instruction Fault Status Register bit functions Bits Field Function [31:13] Reserved SBZ. [12] SD Distinguishes between an AXI Decode or Slave error on an external abort. This bit is only valid for external aborts.
System Control Coprocessor 28 27 31 24 23 22 21 20 Reserved 5 4 14 13 Reserved CacheWay Side Index 0 Reserved Recoverable error Figure 4-33 Auxiliary fault status registers format Table 4-30 shows how the bit values correspond with the auxiliary fault status register functions. Table 4-30 ADFSR and AIFSR bit functions Bits Field Function [31:28] Reserved SBZ. [27:24] CacheWaya The value returned in this field indicates the cache way or ways in which the error occurred.
System Control Coprocessor The Data Fault Address Register bits [31:0] contain the address where the precise abort occurred. To access the DFAR read or write CP15 with: MRC p15, 0, , c6, c0, 0 ; Read Data Fault Address Register MCR p15, 0, , c6, c0, 0 ; Write Data Fault Address Register A write to this register sets the DFAR to the value of the data written. This is useful for a debugger to restore the value of the DFAR.
System Control Coprocessor CP15, c9 sets the location of the TCM base address. For more information see c9, BTCM Region Register on page 4-57 and c9, ATCM Region Register on page 4-58. c6, MPU Region Base Address Registers The MPU Region Base Address Registers describe the base address of the region specified by the Memory Region Number Register. The region base address must always align to the region size.
System Control Coprocessor 16 15 31 Reserved 8 7 6 5 Sub-region disable 1 0 Region size Reserved Enable Figure 4-35 MPU Region Size and Enable Registers format Table 4-32 shows how the bit values correspond with the MPU Region Size and Enable Registers. Table 4-32 Region Size Register bit functions Bits Field Function [31:16] Reserved SBZ. [15:8] Sub-region disable Each bit position represents a sub-region, 0-7a. Bit [8] corresponds to sub-region 0 ...
System Control Coprocessor The MPU Region Access Control Registers are: • read/write registers • accessible in Privileged mode only. Figure 4-36 shows the arrangement of bits in the register. 13 12 11 10 31 Reserved XN 8 7 6 5 AP 3 2 1 0 TEX S C B Reserved Figure 4-36 MPU Region Access Control Register format Table 4-33 shows how the bit values correspond with the Region Access Control Register functions.
System Control Coprocessor Table 4-34 Access data permission bit encoding (continued) AP bit values Privileged permissions User permissions Description b011 Read/write Read/write Full access b100 UNP UNP Reserved b101 Read-only No access Privileged read-only b110 Read-only Read-only Privileged/User read-only b111 UNP UNP Reserved To access the MPU Region Access Control Registers read or write CP15 with: MRC p15, 0, , c6, c1, 4 ; Read Region access control Register MCR p15, 0,
System Control Coprocessor 4.2.20 Cache operations The purpose of c7 is to manage the associated caches. The maintenance operations are formed into two management groups: • Set and Way: — clean — invalidate — clean and invalidate. • Address, usually labelled MVA for Modified Virtual Address, but on this processor all addresses are identical: — clean — invalidate — clean and invalidate.
System Control Coprocessor CRn Opcode_1 c7 CRm Opcode_2 c0 c5 4 0 1 4 6 7 1 2 1 2 4 5 1 1 2 0 0 c6 c10 c11 c14 c15 0 c5 Read-only SBZ SBZ MVA SBZ SBZ MVA Way MVA Way SBZ SBZ MVA MVA Way SBZ Wait For Interrupt (NOP) Invalidate All Instruction Caches Invalidate Instruction Cache Line to Point-of-Unification by MVA Flush Prefetch buffer Invalidate entire branch predictor array (NOP) Invalidate VA from Branch Predictor Array (NOP) Invalidate data cache line to Point-of-Coherency by MVA Invalidate
System Control Coprocessor 31 30 29 Way S+5 S+4 Reserved 0 5 4 Set Reserved Figure 4-39 c7 format for Set and Way Table 4-36 shows how the bit values correspond with the Cache Operation functions for Set and Way format operations. Table 4-36 Functional bits of c7 for Set and Way Bits Field Function [31:30] Way Indicates the cache way to invalidate or clean. [29:S+5] Reserved SBZ. [S+4:5] Set Indicates the cache set to invalidate or clean.
System Control Coprocessor Table 4-38 shows how the bit values correspond with the address format for invalidate and clean operations. Table 4-38 Functional bits of c7 for address format Bits Field Function [31:5] Address Specifies the address to invalidate or clean [4:0] Reserved SBZ Data Synchronization Barrier operation The purpose of the Data Synchronization Barrier operation is to ensure that all outstanding explicit memory transactions complete before any following instructions begin.
System Control Coprocessor 31 12 11 Base address 7 6 Reserved 2 1 0 Size Reserved Enable Figure 4-41 BTCM Region Registers Table 4-39 shows how the bit values correspond with the BTCM Region Register. Table 4-39 BTCM Region Register bit functions Bits Field Function [31:12] Base address Base address. Defines the base address of the BTCM. The base address must be aligned to the size of the BTCM. Any bits in the range [(log2(RAMSize)-1):12] are ignored.
System Control Coprocessor 31 12 11 Base address 7 6 Reserved 2 1 0 Size Reserved Enable Figure 4-42 ATCM Region Registers Table 4-40 shows how the bit values correspond with the ATCM Region Register. Table 4-40 ATCM Region Register bit functions Bits Field Function [31:12] Base address Base address. Defines the base address of the ATCM. The base address must be aligned to the size of the ATCM. Any bits in the range [(log2(RAMSize)-1):12] are ignored.
System Control Coprocessor The Slave Port Control Register is: • a read/write register • accessible in User and Privileged mode. Figure 4-43 shows the arrangement of bits in the register. 31 2 1 0 Reserved Privilege access AXI slave enable Figure 4-43 Slave Port Control Register Table 4-41 shows how the bit values correspond with the Slave Port Control Register functions.
System Control Coprocessor The Context ID Register, bits [31:0] contain the process ID number. To use the Context ID Register, read or write CP15 with: MRC p15, 0, , c13, c0, 1 MCR p15, 0, , c13, c0, 1 4.2.27 ; Read Context ID Register ; Write Context ID Register c13, Thread and Process ID Registers The Thread and Process ID Registers provide locations to store the IDs of software threads and processes for Operating System (OS) management purposes.
System Control Coprocessor 4.2.28 Validation Registers The processor implements a set of validation registers.
System Control Coprocessor On reads, this register returns the current setting. On writes, interrupt requests can be enabled. If an interrupt request has been enabled it is disabled by writing to the nVAL IRQ Enable Clear Register, see c15, nVAL IRQ Enable Clear Register on page 4-65. If one or more of the IRQ request fields (P2, P1, P0, and C) is enabled, and the corresponding counter overflows, then an IRQ request is indicated by nVALIRQ being asserted LOW.
System Control Coprocessor c15, nVAL Reset Enable Set Register The nVAL Reset Enable Set Register enables any of the PMC Registers, PMC0-PMC2, and CCNT, to generate a reset request on overflow. If enabled, the reset request is signaled by nVALRESET being asserted LOW. The nVAL Reset Enable Set Register is: • A read/write register. • Always accessible in Privileged mode. The USEREN Register determines access, see c9, User Enable Register on page 6-15.
System Control Coprocessor The nVAL Debug Request Enable Set Register is: • A read/write register. • Always accessible in Privileged mode. The USEREN Register determines access, see c9, User Enable Register on page 6-15. Figure 4-47 shows the bit arrangement for the nVAL Debug Request Enable Set Register.
System Control Coprocessor 31 3 2 1 0 C Reserved Cycle count overflow IRQ request disable Performance monitor counter overflow IRQ request disables P2 P1 P0 Figure 4-48 nVAL IRQ Enable Clear Register format Table 4-46 shows how the bit values correspond with the nVAL IRQ Enable Clear Register.
System Control Coprocessor Table 4-47 shows how the bit values correspond with the FIQ Enable Clear Register.
System Control Coprocessor Table 4-48 nVAL Reset Enable Clear Register bit functions (continued) Bits Field Function [2] P2 PMC2 overflow reset request [1] P1 PMC1 overflow reset request [0] P0 PMC0 overflow reset request To access the nVAL Reset Enable Clear Register, read or write CP15 with: MRC p15, 0, , c15, c1, 6 ; Read nVAL Reset Enable Clear Register MCR p15, 0, , c15, c1, 6 ; Write nVAL Reset Enable Clear Register On reads, this register returns the current setting.
System Control Coprocessor To access the nVAL Debug Request Enable Clear Register, read or write CP15 with: MRC p15, 0, , c15, c1, 7 ; Read nVAL Debug Request Enable Clear Register MCR p15, 0, , c15, c1, 7 ; Write nVAL Debug Request Enable Clear Register On reads, this register returns the current setting. On writes, overflow debug requests that are currently enabled can be disabled.
System Control Coprocessor Note The nVAL Cache Size Override Register can only be used to select cache sizes for which the appropriate RAM has been integrated. Larger cache sizes require deeper data and tag RAMs, and smaller cache sizes require wider tag RAMs. Therefore, it is unlikely that you can change the cache size using this register except using a simulation model of the cache RAMs. 4.2.
System Control Coprocessor Table 4-52 shows how the bit values correspond to the CFLR when it indicates a correctable cache error. Table 4-52 Correctable Fault Location Register - cache Bits Field Function [31:30] Reserved RAZ [29:26] Way Indicates the Way of the error. [25:24] Side Indicates the source of the error. For cache errors, this value is always 0b00. [23:14] Reserved RAZ [13:5] Index Indicates the index of the location where the error occurred.
System Control Coprocessor 4.2.30 Build Options Registers Build options registers reflect the build configuration options used to build the processor. They do not reflect any pin-configuration options. These registers are: • read-only registers • accessible in Privileged mode only. Note These registers are implemented from the r1pm releases of the processor. Attempting to access these registers in r0pm releases of the processor results in an Undefined Instruction exception.
System Control Coprocessor 31 30 29 28 27 26 25 24 23 22 21 20 19 17 16 14 13 12 11 10 9 8 7 6 5 4 3 2 0 DUAL_CORE DUAL_NCLK NO_ICACHE NO_DCACHE ATCM_ES BTCM_ES NO_IE NO_FPU NO_MPU MPU_REGIONS BREAK_POINTS WATCH_POINTS NO_A_TCM_INF NO_B0_TCM_INF NO_B1_TCM_INF TCMBUSPARITY NO_SLAVE ICACHE_ES DCACHE_ES N0_HARD_ERROR_CACHE AXIBUSPARITY RESERVED Figure 4-56 Build Options 2 Register format Table 4-55 shows how the bit values correspond with the Build Options 2 Register.
System Control Coprocessor Table 4-55 Build Options 2 Register (continued) Bits Field Function [25:24] BTCM_ES Indicates whether an error scheme is implemented on the BTCM interface(s): 00 = no error scheme 01 = 8-bit parity logic 10 = 32-bit error detection and correction 11 = 64-bit error detection and correction. [23] NO_IE Indicates whether the processor supports big-endian instructions: 0 = processor supports big-endian instructions 1 = processor does not support big-endian instructions.
System Control Coprocessor Table 4-55 Build Options 2 Register (continued) Bits Field Function [6:5] DCACHE_ES Indicates whether an error scheme is implemented for the data cache: 00 = no error scheme 01 = 8-bit parity error detection 10 = 32-bit error detection and correction. If the processor does not contain a d-cache, these bits are set to 00.
Chapter 5 Prefetch Unit This chapter describes how the PreFetch Unit (PFU), in conjunction with the DPU, uses program flow prediction to locate branches in the instruction stream and the strategies used to determine if a branch is likely to be taken or not. It contains the following sections: • About the prefetch unit on page 5-2 • Branch prediction on page 5-3 • Return stack on page 5-5. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Prefetch Unit 5.1 About the prefetch unit The purpose of the PFU is to: • perform speculative fetch of instructions ahead of the DPU by predicting the outcome of branch instructions • format instruction data in a way that aids the DPU in efficient implementation. The PFU fetches instructions from the memory system under the control of the DPU, and the internal coprocessors CP14 and CP15. In ARM state the memory system can supply up to two instructions per cycle.
Prefetch Unit 5.2 Branch prediction The PFU normally fetches instructions from sequential addresses. If a branch instruction is fetched, the next instruction to be fetched can only be determined with certainty after the instruction has completed execution at the end of the pipeline in the DPU. If the branch is taken, the next instruction to be executed is not sequential.
Prefetch Unit 5.2.2 Branch predictor Branch prediction in the processor is dynamic and is based around a global history prediction scheme. In addition, there is extra logic to handle predictions that thrash and to predict the end of long loops. The global history scheme is an adaptive predictor that learns the behavior of branches during execution, based on the historical pattern of behavior of the preceding branches. For each pattern of branch behavior, the history table holds a 2-bit hint value.
Prefetch Unit 5.3 Return stack The call-return stack predicts procedural returns that are program flow changes such as loads, and branch register. The dynamic branch predictor determines if conditional procedure returns are predicted as taken or not-taken. The return stack predicts the target address for unconditional procedure returns, and conditional procedure returns that have been predicted as taken by the branch predictor. The return stack consists of a 4-entry circular buffer.
Chapter 6 Events and Performance Monitor This chapter describes the Performance Monitoring Unit (PMU) and event bus interface. It contains the following sections: • About the events on page 6-2 • About the PMU on page 6-6 • Performance monitoring registers on page 6-7 • Event bus interface on page 6-19. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Events and Performance Monitor 6.1 About the events The processor includes logic to detect various events that can occur, for example, a cache miss. These events provide useful information about the behavior of the processor that you can use when debugging or profiling code. The events are made visible on an output bus, EVNTBUS, and can be counted using registers in the Performance Monitoring Unit (PMU).
Events and Performance Monitor Table 6-1 Event bus interface bit functions (continued) EVNTBUS bit position Description CFLR update Event Ref. Value [8] Exception return architecturally executed. This event occurs on every exception return, for example, RFE, MOVS PC, LDM PC^. - 0x0A [9] Change to Context ID executed. - 0x0B [10] Software change of PC, except by an exception, architecturally executed.
Events and Performance Monitor Table 6-1 Event bus interface bit functions (continued) EVNTBUS bit position Description CFLR update Event Ref. Value [22] Instruction cache tag RAM parity or ECC error (correctable). Yes 0x4A [23] Instruction cache data RAM parity or ECC error (correctable). Yes 0x4B [24] Data cache tag or dirty RAM parity error or correctable ECC error. Yes 0x4C [25] Data cache data RAM parity error.
Events and Performance Monitor Table 6-1 Event bus interface bit functions (continued) EVNTBUS bit position Description CFLR update Event Ref. Value [43] TCM correctable ECC error reported by load/store unit. Yes 0x6A [44] TCM correctable ECC error reported by prefetch unit. Yes 0x6B [45] TCM parity or fatal ECC error reported by AXI slave interface. - 0x6C [46] TCM correctable ECC error reported by AXI slave interface. Yes 0x6D N/A Cycle count - 0xFF a.
Events and Performance Monitor 6.2 About the PMU The PMU consists of three event counting registers, one cycle counting register and 12 CP15 registers, for controlling and interrogating the counters. The performance monitoring registers are always accessible in Privileged mode. You can use the User Enable (USEREN) Register to make all of the performance monitoring registers, except for the USEREN, Interrupt Enable Set (INTENS), and Interrupt Enable Clear (INTENC) Registers, accessible in User mode.
Events and Performance Monitor 6.
Events and Performance Monitor Table 6-2 PMNC Register bit functions (continued) Bits Field Function [4] X Enable export of the events to the event bus for an external monitoring block, for example the ETM, to trace events: 0 = Export disabled. This is the reset value. 1 = Export enabled. [3] D Cycle count divider: 0 = Counts every processor clock cycle. This is the reset value. 1 = Counts every 64th processor clock cycle.
Events and Performance Monitor 31 3 2 1 0 C Reserved Cycle count enable Performance monitor counter enables P2 P1 P0 Figure 6-2 CNTENS Register format Table 6-3 shows how the bit values correspond with the CNTENS Register. Table 6-3 CNTENS Register bit functions Bits Field Function [31] C Cycle counter enable set: 0 = disable 1 = enable.
Events and Performance Monitor 31 3 2 1 0 C Reserved Cycle count disable Performance monitor counter disables P2 P1 P0 Figure 6-3 CNTENC Register format Table 6-4 shows how the bit values correspond with the CNTENC Register. Table 6-4 CNTENC Register bit functions Bits Field Function [31] C Cycle counter enable clear: 0 = disable 1 = enable.
Events and Performance Monitor 31 3 2 1 0 C Reserved Cycle count overflow Performance monitor counters overflow flags P2 P1 P0 Figure 6-4 FLAG Register format Table 6-5 shows how the bit values correspond with the FLAG Register. Table 6-5 Overflow Flag Status Register bit functions Bits Field Function [31] Cycle counter overflow Cycle counter overflow flag: 0 = disable 1 = enable.
Events and Performance Monitor If you attempt to use the SWINCR Register to increment a performance monitor count register when the counter event is set to a value other than 0x00 the result is Unpredictable. Figure 6-5 shows the bit arrangement for the SWINCR Register. 31 3 2 1 0 Reserved P2 P1 P0 Performance monitor counters software increment bits Figure 6-5 SWINCR Register format Table 6-6 shows how the bit values correspond with the SWINCR Register.
Events and Performance Monitor Table 6-7 shows how the bit values correspond with the PMNXSEL Register functions. Table 6-7 Performance Counter Selection Register bit functions Bits Field Function [31:5] Reserved RAZ on reads, SBZP on writes [4:0] SEL Counter select: b00000 = selects counter 0 b00001 = selects counter 1 b00010 = selects counter 2. Any values programmed in the PMNXSEL Register other than those specified in Table 6-7 are Unpredictable.
Events and Performance Monitor 8 7 31 Reserved 0 SEL Figure 6-7 EVTSELx Register format Table 6-8 shows how the bit values correspond with the EVTSELx Register. Table 6-8 EVTSELx Register bit functions Bits Field Function [31:8] Reserved RAZ or SBZP. [7:0] SEL Event number selected, see Table 6-1 on page 6-2 for values. The reset value of this field is Unpredictable.
Events and Performance Monitor 6.3.9 c9, Performance Monitor Count Registers There are three PMC Registers (PMC0-PMC2) in the processor. Each PMC Register, as selected by the PMNXSEL Register, counts instances of an event selected by the EVTSEL Register. Bits [31:0] of each PMC Register contain an event count. The register to be accessed is determined by the value in the Performance Counter Selection Register. Each PMC Register is: • A read/write register • Always accessible in Privileged mode.
Events and Performance Monitor Note For more information on access permissions to the performance monitor registers and validation registers, see the ARM Architecture Reference Manual. To access the USEREN Register, read or write CP15 with: MRC p15, 0, , c9, c14, 0 ; Read USEREN Register MCR p15, 0, , c9, c14, 0 ; Write USEREN Register 6.3.
Events and Performance Monitor MRC p15, 0, , c9, c14, 1 ; Read INTENS Register MCR p15, 0, , c9, c14, 1 ; Write INTENS Register If this unit generates an interrupt, the processor asserts the pin nPMUIRQ. You can route this pin to an external interrupt controller for prioritization and masking. This is the only mechanism that signals this interrupt to the processor. Note ARM expects that the Performance Monitor interrupt request signal, nPMUIRQ, connects to a system interrupt controller. 6.3.
Events and Performance Monitor To access the INTENC Register, read or write CP15 with: MRC p15, 0, , c9, c14, 2 ; Read INTENC Register MCR p15, 0, , c9, c14, 2 ; Write INTENC Register ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Events and Performance Monitor 6.4 Event bus interface The event bus, EVNTBUS, is used to signal when an event has occurred. The event bus includes most, but not all, of the events that can be counted by the performance monitoring unit. Each individual event is assigned to an individual bit of this bus, and this bit is asserted for one cycle each time the event occurs. The event bus only signals events when it is enabled. Set the X bit in the Performance Monitor Control Register to enable the event bus.
Chapter 7 Memory Protection Unit This chapter describes the Memory Protection Unit (MPU). It contains the following sections: • About the MPU on page 7-2 • Memory types on page 7-7 • Region attributes on page 7-9 • MPU interaction with memory system on page 7-11 • MPU faults on page 7-12 • MPU software-accessible registers on page 7-13. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Memory Protection Unit 7.1 About the MPU The MPU works with the L1 memory system to control accesses to and from L1 and external memory. For a full architectural description of the MPU, see the ARM Architecture Reference Manual. The MPU enables you to partition memory into regions and set individual protection attributes for each region. The MPU supports zero, eight, or twelve memory regions. Note If the MPU has zero regions, you cannot enable or program the MPU.
Memory Protection Unit This section describes: • Memory regions • Overlapping regions on page 7-4 • Background regions on page 7-6 • TCM regions on page 7-6. 7.1.1 Memory regions Before the MPU is enabled, you must program at least one valid protection region. If you do not do this, the processor will enter a state that only reset can recover. When the MPU is disabled, no access permission checks are performed, and memory attributes are assigned according to the default memory map.
Memory Protection Unit Region attributes Each region has a number of attributes associated with it. These control how a memory access is performed when the processor accesses an address that falls within a given region. The attributes are: • Memory Type, one of: — Strongly Ordered — Device — Normal • Shared or Non-shared • Non-cacheable • Write-through Cacheable • Write-back Cacheable • Read allocation • Write allocation.
Memory Protection Unit 0x4000 Region 2 0x3010 0x3000 Region 1 0x0000 Figure 7-1 Overlapping memory regions Example of using regions that overlap You can use overlapping regions for stack protection. For example: • allocate to region 1 the appropriate size for all stacks • allocate to region 2 the minimum region size, 32 bytes, and position it at the end of the stack for the current process • set the region 2 access permissions to No Access.
Memory Protection Unit 0x4000 Stack 0x0800 0x0000 Guard region Figure 7-3 Overlapping subregion of memory 7.1.3 Background regions Overlapping regions increase the flexibility of how the regions can be mapped onto physical memory devices in the system. You can also use the overlapping properties to specify a background region. For example, you might have a number of physical memory areas sparsely distributed across the 4GB address space.
Memory Protection Unit 7.2 Memory types The ARM Architecture defines a set of memory types with characteristics that are suited to particular devices. There are three mutually exclusive memory type attributes: • Strongly Ordered • Device • Normal. MPU memory regions can each be assigned a memory type attribute. Table 7-2 shows a summary of the memory types.
Memory Protection Unit To ensure optimum performance, you must understand the architectural semantics of the different memory types. Use Device memory type for appropriate memory regions, typically peripherals, and only use Strongly Ordered memory type for memory regions where it is essential. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Memory Protection Unit 7.3 Region attributes Each region has a number of attributes associated with it. These control how a memory access is performed when the processor accesses an address that falls within a given region. The attributes are: • Memory type, see Memory types on page 7-7, one of: — Strongly Ordered — Device — Normal • Shared or Non-shared • Non-cacheable • Write-through cacheable • Write-back cacheable • Read allocation • Write allocation.
Memory Protection Unit Table 7-3 TEX[2:0], C, and B encodings (continued) TEX[2:0] C B Description Memory Type Shareable? 010 1 X Reserved. - - 011 X X Reserved. - - 1BB A A Cacheable memory: Normal S bita AAb = Inner policy BBb = Outer policy a. Region is Shareable if S == 1, and Non-shareable if S == 0. b. Table 7-4 shows the encoding for these bits. 7.3.
Memory Protection Unit 7.4 MPU interaction with memory system This section describes how to enable and disable the MPU. After you enable or disable the MPU, the pipeline must be flushed using ISB and DSB instructions to ensure that all subsequent instruction fetches see the effect of turning on or off the MPU. Before you enable or disable the MPU you must: 1. Program all relevant CP15 registers.
Memory Protection Unit 7.5 MPU faults The MPU can generate three types of fault: • Background fault • Permission fault • Alignment fault. When a fault occurs, the memory access or instruction fetch is precisely aborted, and a prefetch abort or data abort exception is taken as appropriate. No memory accesses are performed on the AXI bus master interface. For more information about fault handling, see Fault handling on page 8-7. 7.5.
Memory Protection Unit 7.6 MPU software-accessible registers Figure 4-2 on page 4-5 shows the CP15 registers that control the MPU. When the MPU is not present, the c6, MPU memory region programming registers on page 4-49 read as zero and ignore writes in Privileged mode. No Undefined instruction exceptions are taken. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Chapter 8 Level One Memory System This chapter describes the processor Level one (L1) memory system. It contains the following sections: • About the L1 memory system on page 8-2 • About the error detection and correction schemes on page 8-4 • Fault handling on page 8-7 • About the TCMs on page 8-13 • About the caches on page 8-18 • Internal exclusive monitor on page 8-34 • Memory types and L1 memory system behavior on page 8-35 • Error detection events on page 8-36.
Level One Memory System 8.1 About the L1 memory system The processor L1 memory system can be configured during implementation and integration. It can consist of: • separate instruction and data caches • multiple Tightly-Coupled Memory (TCM) areas • a Memory Protection Unit (MPU). The instruction-side and data-side can each optionally have their own L1 caches. The cache architecture is Harvard, that is, only instructions can be fetched from the i-cache, and only data can be fetched from the d-cache.
Level One Memory System AXI bus Processor AXI master External Tightly-Coupled Memory (TCM) Data cache controller and RAMs Instruction cache controller and RAMs B0TCM B1TCM ATCM Interconnect Prefetch Unit (PFU) Memory Protection Unit (MPU) Load Store Unit (LSU) AXI slave Data Processing Unit (DPU) AXI bus Figure 8-1 L1 memory system block diagram ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Level One Memory System 8.2 About the error detection and correction schemes In silicon devices, stray radiation and other effects can cause the data stored in a RAM to be corrupted. The TCMs and caches on Cortex-R4 can be configured to detect and correct errors that can occur in the RAMs. Extra, redundant data is computed by the processor and stored in the RAMs alongside the real data.
Level One Memory System 8.2.2 Error checking and correction The processor supports Error Checking and Correction (ECC) schemes for either 64-bits or 32-bits of data, and these have similar properties, although though the size of the data chunk that the ECC scheme applies to is different. For each data chunk, either 32-bits or 64-bits, aligned, a number of redundant code bits are computed and stored with the data.
Level One Memory System 8.2.5 Error correction When a correctable error is detected in data that has been read from a RAM, the processor has various ways of generating the correct data, which follow two schemes: Correct inline The error code bits are used to correct the data read from the RAM, and this data is used. This is the simplest way of correcting the data. Correct-and-retry The error code bits are used to correct the data, and this data is then written back to the RAM.
Level One Memory System 8.3 Fault handling Faults can occur on instruction fetches for the following reasons: • MPU background fault • MPU permission fault • External AXI slave error (SLVERR) • External AXI decode error (DECERR) • Cache parity or ECC error • TCM parity or ECC error • TCM external error • TCM external retry request • Breakpoints, and vector capture events.
Level One Memory System External faults A memory access performed through the AXI master interface can generate two different types of error response, a slave error (SLVERR) or decode error (DECERR). These are known as external errors, because they are generated by the AXI system outside the processor. Precise aborts are generated for instruction fetches, data loads, and data stores to strongly-ordered-type memory. Stores to normal-type or device-type memory generate imprecise aborts.
Level One Memory System Debug events The debug logic in the processor can be configured to generate breakpoints or vector capture events on instruction fetches, and watchpoints on data accesses. If the processor is software-configured for monitor-mode debugging, an abort is taken when one of these events occurs, or when a BKPT instruction is executed. For more details, see Chapter 11 Debug.
Level One Memory System Precise abort exceptions The following registers are updated when a precise abort exception is taken: Fault Address Register There are two fault address registers, one for prefetch aborts (IFAR) and one for data aborts (DFAR). These indicate the address of the memory access that caused the fault. See Fault Status and Address Registers on page 4-45.
Level One Memory System reprogramming the MPU to reflect this. Alternatively, an imprecise external abort might indicate that a software error meant that a store instruction occurred to an unmapped memory address. Such an abort is fatal to the system or process because no information is recorded about the address the error occurred on, or the instruction which caused the error.
Level One Memory System When the processor is in debug halt-state, any correctable error is corrected as appropriate, but the memory access is not repeated to fetch the correct data, therefore the instruction generating the error does not complete successfully. Instead, the sticky precise abort flag in the DSCR is set. See CP14 c1, Debug Status and Control Register on page 11-14. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Level One Memory System 8.4 About the TCMs The processor has two TCM interfaces to support the connection of local memories. The ATCM interface has one TCM port. The BTCM interface can support one or two TCM ports. Each TCM port is a physical connection on the processor that is suitable for connection to SRAM with minimal glue logic. These ports are optimized for low latency memory. The TCM ports are designed to be connected to RAM, or RAM-like memory, that is, Normal-type memory.
Level One Memory System 8.4.2 ATCM and BTCM configuration The TCM interfaces are configured during implementation and integration. You can configure the ATCM interface to be removed, and not included in the processor design. If implemented, the ATCM can have only a single port.
Level One Memory System Handling TCM parity errors If a TCM interface has been built with parity error checking, you can enable this by setting the appropriate bits in the Auxiliary Control Register. See c1, Auxiliary Control Register on page 4-38. If the BTCM interface has been built with two ports, parity checking can be enabled for each port individually.
Level One Memory System When either the LSU or the AXI slave interface is performing a read-modify-write operation on a TCM port, various internal data hazards exist for either the AXI-slave interface or the LSU. In these cases, additional stall cycles are generated, beyond those normally required for arbitration.
Level One Memory System In addition, an external error detection scheme might require that data is read and written in particular sized chunks. The load/store-64 feature, when enabled for a particular TCM interface, causes all loads and stores to the TCM ports to be of 64-bits of data. This feature is also known as Read-Modify-Write (RMW), because it causes the processor to generate read-modify-write sequences for any store of less than 64-bits.
Level One Memory System 8.5 About the caches The L1 memory system can be configured to include instruction and data caches of varying sizes. You can configure whether the cache controller is included and, if it is, configure the size of each cache independently. The cached instructions or data are fetched from external memory using the L2 memory interface. The cache controllers use RAMs that are integrated into the Cortex-R4 macrocell during implementation.
Level One Memory System Store buffer merging The store buffer has merging capabilities. If a previous write access has updated an entry, other write accesses on the same line can merge into this entry. Merging is only possible for stores to Normal memory. Merging is possible between several entries that can be linked together if the data inside the different entries belong to the same cache line. No merging occurs for writes to Strongly Ordered or Device memory.
Level One Memory System • Invalidate by Set/Way combination • Clean by address (MVA) • Clean by Set/Way combination • Clean and Invalidate by address (MVA) • Clean and Invalidate by Set/Way combination • Data Memory Barrier (DMB) and Data Synchronization Barrier (DSB) operations. The system control coprocessor operations supported for the instruction cache are: • Invalidate all • Invalidate by address. For more information on cache operations, see Cache operations on page 4-54. 8.5.
Level One Memory System Address decoder faults The error detection schemes described in this section provide protection against errors that occur in the data stored in the cache RAMs. Each RAM normally includes a decoder which enables access to that data and, if an error occurs in this logic, it is not normally detected by these error detection schemes. The processor includes features that enable it to detect some address decoder faults.
Level One Memory System Handling cache ECC errors Table 8-3 shows the behavior of the processor on a cache ECC error, depending on bits [5:3] of the Auxiliary Control Register, see Auxiliary Control Registers on page 4-38.
Level One Memory System Errors on instruction cache read All parity or ECC errors detected on instruction cache reads are correctable. If aborts are enabled, a precise prefetch abort exception occurs. The instruction FAR gives the address that caused the error to be detected. The instruction FSR indicates a parity error on a read. The auxiliary FSR indicates that the error was in the cache and which cache Way the error was in.
Level One Memory System • • • • • Invalidate data cache by set/way Clean data cache by address Clean data cache by set/way on page 8-25 Clean and invalidate data cache by address on page 8-25 Clean and invalidate data cache by set/way on page 8-25. Invalidate all instruction cache This operation ignores all errors in the cache and sets all instruction cache entries to invalid regardless of error events. This operation cannot generate an imprecise abort, and no error events are signaled.
Level One Memory System Any detected error is signaled with the appropriate event. Clean data cache by set/way This operation does not require a cache lookup. It refers to a particular cache line. The tag and dirty RAMs for the cache line are checked. Note When force write-through is enabled, the dirty bit is ignored. If the tag or dirty RAM has an uncorrectable error, the data is not written to memory. If the line is dirty, the data is written back to external memory.
Level One Memory System Any uncorrectable errors found cause an imprecise abort. An imprecise abort can also be raised on a correctable error if aborts on RAM errors are enabled in the Auxiliary Control Register. Any detected error is signaled with the appropriate event. 8.5.4 Cache RAM organization This section describes RAM organization in the following sections: • Tag RAM • Dirty RAM on page 8-27 • Data RAM on page 8-27. Tag RAM The tag RAMs consist of four ways of up to 512 lines.
Level One Memory System Table 8-7 shows the tag RAM cache sizes and associated RAM organization, assuming no parity or ECC. For parity, the width of the tag RAMs must be increased by one bit. For ECC, the width of the tag RAMs must be increased by seven bits.
Level One Memory System • Write a line to the eviction buffer in one cycle, a 256-bit read access. • Fill a line in one cycle from the linefill buffer, a 256-bit write access. Figure 8-3 shows a cache look-up being performed on all banks with one RAM access.
Level One Memory System Data RAM sizes without parity or ECC implemented Table 8-9 shows the organization for instruction and data caches when neither parity nor ECC is implemented.
Level One Memory System Table 8-12 Data cache data RAM sizes, with parity Cache size Data RAMs 4KB, 4 1KB ways 8 banks 36 bits 128 lines 8KB, 4 2KB ways 8 banks 36 bits 256 lines 16KB, 4 4KB ways 8 banks 36 bits 512 lines 32KB, 4 8KB ways 8 banks 36 bits 1024 lines 64KB, 4 16KB ways 8 banks 36 bits 2048 lines Table 8-13 shows the organization of the data cache RAM bits when parity is implemented.
Level One Memory System Table 8-15 shows the organization for the data cache when ECC is implemented. For ECC error detection, seven bits are added per 32 bits, so seven bits are added for each RAM bank.
Level One Memory System The following code is an example of disabling the caches: MRC p15, 0, R1, c1, c0, 0 ; Read System Control Register configuration data BIC R1, R1, #0x1 <<12 ; instruction cache disable BIC R1, R1, #0x1 <<2 ; data cache disable DSB MCR p15, 0, R1, c1, c0, 0 ; disabled cache RAMs ISB ; Clean entire data cache. This routine will depend on the data cache size.
Level One Memory System ; Clean entire data cache. This routine will depend on the data cache size. It can be omitted if it is known that the data cache has no dirty data (e.g. if the cache has not been enabled yet).
Level One Memory System 8.6 Internal exclusive monitor The processor L1 memory system has an internal exclusive monitor. This is a two state, open and exclusive, state machine that manages load/store exclusive (LDREXB, LDREXH, LDREX, LDREXD, STREXB, STREXH, STREX and STREXD) accesses and clear exclusive (CLREX) instructions. You can use these instructions, operating in the L1 memory system, to construct semaphores and ensure synchronization between different processes.
Level One Memory System 8.7 Memory types and L1 memory system behavior The behavior of the L1 memory system depends on the type attribute of the memory that is being accessed: • Only Normal, Non-shared memory can be cached in the RAMs. • The store buffer can merge any stores to Normal memory. See Store buffer on page 8-18 for more information.
Level One Memory System 8.8 Error detection events The processor generates a number of events related to the internal error detection and correction schemes in the TCMs and caches. For more information, see Table 6-1 on page 6-2. This section describes: • TCM error events • Instruction-cache error events • Data-cache error events • Events and the CFLR. 8.8.
Level One Memory System generates an event. See Table 6-1 on page 6-2 to see which events are CFLR-related. For correctable cache errors, the CLFR does not record whether the error occurred in the data RAM or tag/dirty RAM. This distinction is only made by the events. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Chapter 9 Level Two Interface This chapter describes the features of the Level two (L2) interface not covered in the AMBA AXI Protocol Specification. It contains the following sections: • About the L2 interface on page 9-2 • AXI master interface on page 9-3 • AXI master interface transfers on page 9-7 • AXI slave interface on page 9-20 • Enabling or disabling AXI slave accesses on page 9-23 • Accessing RAMs using the AXI slave interface on page 9-24. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited.
Level Two Interface 9.1 About the L2 interface This section describes the processor L2 interface. The L2 interface consists of AXI master and AXI slave interfaces. The processor is designed for use in larger chip designs using the Advanced Microcontroller Bus Architecture (AMBA) AXI protocol. The processor uses the L2 interfaces as its interface to memory and peripheral devices. External AXI masters and the processor can use the AXI slave interface to access the processor RAMs.
Level Two Interface 9.2 AXI master interface The processor has a single AXI master interface, with one port which is used for: • I-cache linefills • D-cache linefills and evictions • Non-cacheable (NC) Normal-type memory instruction fetches • NC Normal-type memory data accesses • Device and Strongly-ordered type data accesses, normally to peripherals. The port is 64 bits wide, and conforms to the AXI standard as described in the AMBA AXI Protocol Specification.
Level Two Interface 9.2.1 Identifiers for AXI bus accesses Accesses on the AXI bus use ID values as follows: Outstanding write/read access on different IDs This means, for example, that a Non-cacheable (NC) read and linefills can be outstanding on the AXI bus simultaneously as long as the IDs are different.
Level Two Interface 9.2.4 Eviction buffer As soon as a linefill is requested, the selected evicted cache line is loaded into the EViction Buffer (EVB). The EVB forwards this information to the AXI bus when possible. The EVB has a structure of 256 bits for data and 32 bits for the address. See Cache line write-back (eviction) on page 9-13 for details of the AXI transaction generated. The EVB is removed if cache RAMs are not implemented for the processor. 9.2.
Level Two Interface Memory system implications for AXI accesses The attributes of the memory being accessed can affect an AXI access. The L1 memory system can cache any Normal memory address that is marked as either: • Cacheable, write-back, read- and write-allocate, non-shared • Cacheable, write-through, read-allocate only, non-shared. However, Device and Strongly Ordered memory is always Non-cacheable.
Level Two Interface 9.3 AXI master interface transfers The processor conforms to the AXI specification, but it does not generate all the AXI transaction types that the specification permits. This section describes the types of AXI transaction that the Cortex-R4 AXI master does not generate.
Level Two Interface 9.3.1 Restrictions on AXI transfers The Cortex-R4 AXI master interface applies the following restrictions to the AXI transactions it generates: 9.3.2 • A burst never transfers more than 32 bytes. • The burst length is never more than 8 transfers. • No transaction ever crosses a 32-byte boundary in memory. See AXI transaction splitting on page 9-16. • FIXED bursts are never used. • The write address channel always issues INCR type bursts, and never WRAP or FIXED.
Level Two Interface Table 9-4 Non-cacheable LDRB (continued) Address[2:0] ARADDRM ARBURSTM ARSIZEM ARLENM 0x4 (byte 4) 0x04 Incr 8-bit 1 data transfer 0x5 (byte 5) 0x05 Incr 8-bit 1 data transfer 0x6 (byte 6) 0x06 Incr 8-bit 1 data transfer 0x7 (byte 7) 0x07 Incr 8-bit 1 data transfer LDRH Table 9-5 shows the values of ARADDRM, ARBURSTM, ARSIZEM, and ARLENM for a Non-cacheable LDRH from halfwords 0-3 in Strongly Ordered or Device memory.
Level Two Interface LDM that transfers five registers Table 9-7 shows the values of ARADDRM, ARBURSTM, ARSIZEM, and ARLENM for a Non-cacheable LDM that transfers five registers (an LDM5) in Strongly Ordered or Device memory.
Level Two Interface STRB Table 9-8 shows the values of AWADDRM, AWBURSTM, AWSIZEM, and AWLENM for an STRB to Strongly Ordered or Device memory over the AXI master port.
Level Two Interface STR or STM of one register Table 9-10 shows the values of AWADDRM, AWBURSTM, AWSIZEM, and AWLENM for an STR or an STM that transfers one register (an STM1) over the AXI master port to Strongly Ordered or Device memory.
Level Two Interface 9.3.3 Linefills Loads and instruction fetches from Normal, Cacheable memory that do not hit in the cache generate a cache linefill when the appropriate cache is enabled. Table 9-12 shows the values of ARADDRM, ARBURSTM, ARSIZEM, and ARLENM for cache linefills.
Level Two Interface Table 9-14 LDRH from Non-cacheable Normal memory (continued) Address[2:0] ARADDRM ARBURSTM ARSIZEM ARLENM 0x4 (byte 4) 0x04 Incr 16-bit 1 data transfer 0x5 (byte 5) 0x04 Incr 32-bit 1 data transfer 0x6 (byte 6) 0x06 Incr 16-bit 1 data transfer 0x7 (byte 7) 0x07 Incr 32-bit 2 data transfers Table 9-15 shows possible values of ARADDRM, ARBURSTM, ARSIZEM, and ARLENM for a Non-cacheable LDR or an LDM that transfers one register, an LDM1.
Level Two Interface Table 9-16 LDM5, Non-cacheable Normal memory or cache disabled (continued) Address[4:0] ARADDRM ARBURSTM ARSIZEM ARLENM 0x18 (word 6) 0x18 Incr 64-bit 1 data transfer 0x00 Incr 64-bit 2 data transfers 0x1C Incr 32-bit 1 data transfer 0x00 Incr 64-bit 2 data transfers 0x1C (word 7) 9.3.
Level Two Interface Table 9-18 shows possible values of AWADDRM, AWBURSTM, AWSIZEM, and AWLENM for an STR or an STM that transfers one register, an STM1, to Normal memory through the AXI master port.
Level Two Interface • If the data comes from two cache lines, then there are two AXI transactions. For example, for LDMIA R10, {R0-R5} with R10 = 0x1010, the interface might generate one burst of two 64-bit reads, and one burst of a single 64-bit read, as shown in Table 9-20.
Level Two Interface Example 9-1 Write merging MOV r0, #0x4000 STRH r1, [r0, #0x18]; STR r2, [r0, #0xC] ; STMIA r0, {r4-r7} ; STRB r3, [r0, #0x1D]; Store Store Store Store a halfword at 0x4018 a word at 0x400C four words at 0x4000 a byte at 0x401D If the memory at address 0x4000 is marked as Strongly Ordered or Device type memory, the AXI transactions shown in Table 9-23 are generated.
Level Two Interface The transactions shown in Table 9-24 on page 9-18 show this behavior. They are provided as examples only, and are not an exhaustive description of the AXI transactions. Depending on the state of the processor, and the timing of the accesses, the actual bursts generated might have a different size and length to the examples shown, even for the same instruction.
Level Two Interface 9.4 AXI slave interface The processor has a single AXI slave interface, with one port. The port is 64 bits wide and conforms to the AXI standard as described in the AMBA AXI Protocol Specification. Within the AXI standard, the slave port uses the AWUSERS and ARUSERS each as four separate chip select input signals to enable access to: • BTCM • ATCM • instruction cache RAMs • data cache RAMs. The external AXI system must generate the chip select signals.
Level Two Interface 9.4.2 TCM parity and ECC support The TCMs can support parity or ECC, as described in TCM internal error detection and correction on page 8-14. If a write transaction is issued to the AXI slave, the slave interface calculates the required parity or ECC bits to store to the TCM. ECC schemes require the AXI slave to perform a read-modify-write sequence if the write data width is smaller than the ECC chunk size.
Level Two Interface 9.4.6 AXI slave characteristics This section describes the capabilities of the AXI slave interface, and the attributes of its AXI port. You must not make any other assumptions about the behavior of the AXI slave port except that it conforms to the AMBA AXI Protocol Specification. • The AXI slave interface supports merging of data.
Level Two Interface 9.5 Enabling or disabling AXI slave accesses This section describes how to enable or disable AXI slave accesses to the cache RAMs. When caches are accessible by the AXI slave interface, the caches are considered to be cache-off from the processor. After turning the interface on or off, an ISB instruction must flush the pipeline so that all subsequent instruction fetches return valid data.
Level Two Interface 9.6 Accessing RAMs using the AXI slave interface This section describes how to access the TCM and cache RAMs using the AXI slave interface. Table 9-26 shows the bits of the ARUSERS or AWUSERS inputs to use to access RAM or a group of RAMs. Each bit is a one-hot 4-bit input, with each bit corresponding to a particular RAM or group of RAMs.
Level Two Interface 9.6.1 TCM RAM access Table 9-27 shows the decode of the ARUSERS[3:0] signal, and the state of the address signals for accessing the TCM RAMs. The table also shows the SLBTCMSB configuration input signal that determines the address bit that is used, either: • ARADDRS[3] • ARADDRS[MSB], see Table 9-28.
Level Two Interface • 9.6.2 There is no TCM present. The mapping of bus addresses to ARUSERS and ARADDRS is determined when the processor is integrated. You must understand this mapping to use of the AXI-slave interface within your system. Cache RAM access This section contains the following: • Memory map when accessing the cache RAMs • Data RAM access on page 9-27 • Tag RAM access on page 9-29 • Dirty RAM access on page 9-31.
Level Two Interface Table 9-30 Cache tag/valid RAM bank/address decode (continued) Inputs RAM bank selected Cache way 0010 Bank 1 1 0100 Bank 2 2 1000 Bank 3 3 ARADDRS[18:15] Table 9-31 Cache data RAM bank/address decode Inputs RAM bank selected ARADDRS[18:15] ARADDRS[3] 0001 0 Bank 0 0001 1 Bank 1 0010 0 Bank 2 0010 1 Bank 3 0100 0 Bank 4 0100 1 Bank 5 1000 0 Bank 6 1000 1 Bank 7 Note You can only access the cache RAMs using 32-bit or 64-bit AXI transfers.
Level Two Interface Table 9-33 Data format, instruction cache and data cache, with parity Data bit Description [63:50] Not used, read-as-zero [49] Parity bit for data value [31:24] or [63:56] [48] Parity bit for data value [23:16] or [55:48] [47:32] Data value, [31:16] or [63:48] [31:18] Not used, read-as-zero [17] Parity bit for data value [15:8] or [47:40] [16] Parity bit for data value [7:0] or [39:32] [15:0] Data value, [15:0] or [47:32] Table 9-34 Data format, instruction cache, with
Level Two Interface Tag RAM access The following tables show the data formats for tag RAM accesses: • Table 9-36 shows the format for read accesses when neither parity nor ECC is implemented • Table 9-37 shows the format for read accesses when parity is implemented • Table 9-38 shows the format for read accesses when ECC is implemented • Table 9-39 on page 9-30 shows the format for write accesses when neither parity nor ECC is implemented • Table 9-40 on page 9-30 shows the format for write access
Level Two Interface Table 9-38 Tag register format for reads, with ECC (continued) Data bit Description [31:30] Not used, read-as-zero [29:23] ECC, way 0/1 [22] Valid, way 0/1 [21:0] Tag value, way 0/1 Table 9-39 Tag register format for writes, no parity or ECC Data bit Description [63:23] Not used, read-as-zero [22] Valid, all ways [21:0] Tag value, all ways Table 9-40 Tag register format for writes, with parity Data bit Description [63:24] Not used, read-as-zero [23] Parity.
Level Two Interface Dirty RAM access The following tables show the data format for accessing the dirty RAM: • Table 9-42 shows the format when parity is implemented, or no error scheme is implemented • Table 9-43 shows the format when ECC is implemented.
Level Two Interface Table 9-43 Dirty register format, with ECC (continued) Data bit Description [14:11] ECC, way 1 [10:9] Outer attributes, way 1 [8] Dirty value, way 1 [7] Not used, read-as-zero [6:3] ECC, way 0 [2:1] Outer attributes, way 0 [0] Dirty value, way 0 Other examples of accessing cache RAMs Normally ARADDRS[18:15] is a one-hot field, and only accesses one RAM at a time.
Chapter 10 Power Control This chapter describes the processor power control functions. It contains the following sections: • About power control on page 10-2 • Power management on page 10-3. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Power Control 10.1 About power control The features of the processor that improve energy efficiency include: • branch and return prediction, reducing the number of incorrect instruction fetch and decode operations • the caches use sequential access information to reduce the number of accesses to the tag RAMs and to unwanted data RAMs. In the processor, extensive use is also made of gated clocks and gates to disable inputs to unused functional blocks.
Power Control 10.2 Power management The processor supports four levels of power management. This section describes: • Run mode • Standby mode • Dormant mode • Shutdown mode • Communication to the Power Management Controller on page 10-4. 10.2.1 Run mode Run mode is the normal mode of operation where all of the functionality of the processor is available. 10.2.2 Standby mode Standby mode disables most of the clocks of the device, while keeping the design powered up.
Power Control disabled and finish with a Data Synchronization Barrier operation. When all the state of the processor is saved the processor executes a WFI instruction. The STANDBYWFI signal is asserted to indicate that the processor can enter Shutdown mode. 10.2.5 Communication to the Power Management Controller You can use a Power Management Controller (PMC) to control the powering up and powering down of the processor.
Chapter 11 Debug This chapter describes the processor debug unit. These features assist the development of application software, operating systems, and hardware.
Debug 11.1 Debug systems The Cortex-R4 processor is one component of a debug system. Figure 11-1 shows a typical system. Debug host Host computer running RealView Debugger Protocol converter For example, RealView ICE Debug target Development system containing Cortex-R4 processor Figure 11-1 Typical debug system This typical system has three parts, described in the following sections: • Debug host • Protocol converter • Debug target. 11.1.
Debug 11.2 About the debug unit The processor debug unit assists in debugging software running on the processor. You can use the processor debug unit, in combination with a software debugger program, to debug: • application software • operating systems • ARM processor-based hardware systems. The debug unit enables you to: • stop program execution • examine and alter processor state • examine and alter memory and peripheral state • restart the processor.
Debug ARM DDI 0363E ID013010 • data address comparators for triggering watchpoints, see Watchpoint Value Registers on page 11-26 and Watchpoint Control Registers on page 11-26 • a bidirectional Debug Communication Channel (DCC), see Debug communications channel on page 11-55 • all other state information associated with the debug unit. Copyright © 2009 ARM Limited. All rights reserved.
Debug 11.3 Debug register interface You can access the processor debug register map using the APB slave port. This is the only way to get full access to the processor debug capability. ARM recommends that if your system requires the processor to access its own debug registers, you choose a system interconnect structure that enables the processor to access the APB slave port by executing load and stores to an appropriate area of physical memory.
Debug Note The CP14 debug instructions are defined as having Opcode_1 set to 0. Table 11-2 CP14 debug registers summary 11.3.4 Instruction Mnemonic Description MRC p14, 0, , c0, c0, 0 DIDR Debug Identification Register. See CP14 c0, Debug ID Register on page 11-10. MRC p14, 0, , c1, c0, 0 DRAR Debug ROM Address Register. See CP14 c0, Debug ROM Address Register on page 11-12. MRC p14, 0, , c2, c0, 0 DSAR Debug Self Address Register.
Debug Table 11-3 Debug memory-mapped registers (continued) 11.3.
Debug The Watchpoint Fault Address Register (WFAR) reads an address and a processor state dependent offset, +8 for ARM and +4 for Thumb. 11.3.6 Power domains The processor has a single power domain. Therefore, it does not support the Event Catch Register, the OS Lock, or the OS Save and Restore functionality. 11.3.
Debug OS Lock • • The processor does not support OS Lock. Note These locks are set to their reset values only on reset of the debug logic, provided by PRESETDBGn. You must set the PADDRDBG31 input signal to 1 for accesses originated from the external debugger for the Software Lock override feature to work.
Debug 11.4 Debug register descriptions Table 11-5 shows definitions of terms used in the register descriptions. Table 11-5 Terms used in register descriptions 11.4.1 Term Description R Read-only. Written values are ignored. W Write-only. This bit cannot be read. Reads return an Unpredictable value. RW Read or write. RAZ Read-As-Zero. Always zero when read. RAO Read-As-One. Always one when read. SBZP Should-Be-Zero (SBZ) or Preserved (P).
Debug The Debug ID Register is: • in CP14 c0 • a 32 bit read-only register • accessible in User and Privileged modes. Figure 11-2 shows the bit arrangement of the DIDR. 31 28 27 WRP 24 23 BRP 20 19 8 7 16 15 Reserved Context ID 4 3 Variant 0 Revision Debug architecture version Figure 11-2 Debug ID Register format Table 11-7 shows how the bit values correspond with the Debug ID Register functions.
Debug To use the Debug ID Register, read CP14 c0 with: MRC p14, 0, , c0, c0, 0 ; Read Debug ID Register 11.4.3 CP14 c0, Debug ROM Address Register The Debug ROM Address Register is a read-only register that returns a 32-bit Debug ROM Address Register value. This is the address that indicates where in memory a debug monitor can locate the debug bus ROM specified by the CoreSight™ multiprocessor trace and debug architecture. This ROM holds information about all the components in the debug bus.
Debug The Debug Self Address Offset Register is: • in CP14 c0, sub-register c2 • a 32 bit read-only register • accessible in User and Privileged modes. Figure 11-4 shows the bit arrangement of the Debug Self Address Offset Register. 31 12 11 Debug bus self address offset value 2 1 0 Reserved Valid bits Figure 11-4 Debug Self Address Offset Register format Table 11-9 shows how the bit values correspond with the Debug Self Address Offset Register functions.
Debug 11.4.5 CP14 c1, Debug Status and Control Register The DSCR contains status and control information about the debug unit. Figure 11-5 shows the bit arrangement of the DSCR.
Debug Table 11-10 Debug Status and Control Register functions (continued) Bits Field Function [24] InstrCompl Instruction complete read-only bit. This flag determines whether the processor has completed execution of an instruction issued through the APB port. 0 = processor is currently executing an instruction fetched from the ITR Register 1 = processor is not currently executing an instruction fetched from the ITR Register.
Debug Table 11-10 Debug Status and Control Register functions (continued) Bits Field Function [13] ARM Execute ARM instruction enable bit: 0 = disabled, this is the reset value 1 = enabled. If this bit is set and an ITR write succeeds, the processor fetches an instruction from the ITR for execution. If this bit is set to 1 when the processor is not in debug state, the behavior of the processor is Unpredictable.
Debug Table 11-10 Debug Status and Control Register functions (continued) Bits Field Function [5:2] MOE Method of entry bits: b0000 = a DRCR[0] halting debug event occurred b0001 = a breakpoint occurred b0100 = an EDBGRQ halting debug event occurred b0011 = a BKPT instruction occurred b1010 = a precise watchpoint occurred others = reserved. These bits are set to indicate any of: • the cause of a debug exception • the cause for entering debug state.
Debug • writes to ITR are ignored if InstrCompl_l is set to b0 • following a successful write to DTRRX, DTRRXfull and DTRRXfull_l are set to b1 • following a successful read from DTRTX, DTRTXfull and DTRTXfull_l are cleared to b0 • following a successful write to ITR, InstrCompl and InstrCompl_l are cleared to b0. Debuggers accessing these registers must first read DSCR. This has the side-effect of copying DTRRXfull and DTRTXfull to DTRRXfull_l and DTRTXfull_l.
Debug Table 11-11 shows how the bit values correspond with the DTRRX and DTRTX functions. Table 11-11 Data Transfer Register functions Bits Field Function [31:0] Data Reads the Data Transfer Register. This is read-only for the CP14 interface. Note Reads of the DTRRX through the coprocessor interface cause the DTRTXfull flag to be cleared. However, reads of the DTRRX through the APB port do not affect this flag. [31:0] Data Writes the Data Transfer Register.
Debug 31 8 7 6 5 4 3 2 1 0 Reserved Reset Reserved SVC Prefetch abort Data abort Reserved IRQ FIQ Figure 11-7 Vector Catch Register format If one of the bits in this register is set and the instruction at the corresponding vector is committed for execution, the processor either enters debug state or takes a debug exception. • • Note Under this model, any prefetch from an exception vector can trigger a vector catch, not only the ones because of exception entries.
Debug 11.4.9 Debug State Cache Control Register The DSCCR controls the L1 cache behavior when the processor is in debug state. Figure 11-8 shows the bit arrangement of the DSCCR. 31 3 2 1 0 Reserved Not write-through Instruction cache line-fill Data cache line-fill Figure 11-8 Debug State Cache Control Register format For information on the usage model of the DSCCR register, see Cache debug on page 11-50.
Debug 11.4.11 Debug Run Control Register The DRCR requests the processor to enter or leave debug state. It also clears the sticky exception bits present in the DSCR. Figure 11-9 shows the bit arrangement of the DRCR. 5 4 3 2 1 0 31 Reserved Cancel memory request Clear sticky pipeline advance Clear sticky exceptions Restart request Halt request Figure 11-9 Debug Run Control Register format Table 11-15 shows how the bit values correspond with the Debug Run Control Register functions.
Debug 11.4.12 Breakpoint Value Registers Each BVR is associated with a Breakpoint Control Register (BCR). BCRy is the corresponding control register for BVRy. A pair of breakpoint registers, BVRy/BCRy, is called a Breakpoint Register Pair (BRP). BVR0-7 are paired with BCR0-7 to make BRP0-7. The breakpoint value contained in this register corresponds to either an instruction address or a context ID.
Debug Table 11-17 shows how the bit values correspond with the Breakpoint Control Registers functions. Table 11-17 Breakpoint Control Registers functions Bits Field Function [31:29] Reserved Do not modify on writes. On reads, the value returns zero. [28:24] Breakpoint address mask This field sets a breakpoint on a range of addresses by masking lower order address bits out of the breakpoint comparison.
Debug Table 11-17 Breakpoint Control Registers functions (continued) Bits Field Function [8:5] Byte address select For breakpoints programmed to match an instruction address, the debugger must write a word-aligned address to the BVR. You can then use this field to program the breakpoint so it hits only if certain byte addresses are accessed.
Debug Table 11-18 Meaning of BVR bits [22:20] (continued) BVR[22:20] Meaning b011 The corresponding BVR[31:0] is compared against CP15 Context ID Register, c13. This BRP links another BRP (of the BCR[21:20]=b01 type), or WRP (with WCR[20]=b1). They generate a breakpoint or watchpoint debug event on a joint instruction address or data address and context ID match. For this BRP, BCR[8:5] must be set to b1111, BCR[15:14] must be set to b00, and BCR[2:1] must be set to b11.
Debug 29 28 31 24 23 Watchpoint address mask 16 15 14 13 12 21 20 19 E Linked BRP 5 4 Byte address select 1 0 32 L/S L/S SP W Reserved Reserved Reserved Secure state access control Figure 11-11 Watchpoint Control Registers format Table 11-20 shows how the bit values correspond with the Watchpoint Control Registers functions. Table 11-20 Watchpoint Control Registers functions Bits Field Function [31:29] Reserved Do not modify on writes. On reads, the value returns zero.
Debug Table 11-20 Watchpoint Control Registers functions (continued) Bits Field Function [12:5] Byte address select The WVR is programmed with word-aligned address. You can use this field to program the watchpoint so it only hits if certain byte addresses are accessed: b00000000 The watchpoint never hits. bxxxxxxx1 The watchpoint hits if the byte at address (WVR[31:0] & 0xFFFFFFFC) +0 is accessed. bxxxxxx1x The watchpoint hits if the byte at address (WVR[31:0] & 0xFFFFFFFC) +1 is accessed.
Debug 31 1 0 Reserved Lock implemented bit Figure 11-12 OS Lock Status Register format Table 11-21 shows how the bit values correspond with the OS Lock Status Register functions. Table 11-21 OS Lock Status Register functions Bits Field Function [31:1] Reserved RAZ. [0] Lock implemented bit Indicates that the OS lock functionality is not implemented. This bit always reads 0. 11.4.
Debug a. Cortex-R4 does not implement the Security Extensions, so all the debug features are considered secure. 11.4.18 Device Power-down and Reset Control Register The PRCR is a read/write register that controls reset and power-down related functionality. Figure 11-14 shows the bit arrangement of the PRCR.
Debug 4 3 2 1 0 31 Reserved Sticky reset status Reset status Sticky power-down status Power-down status Figure 11-15 PRSR format Table 11-24 shows how the bit values correspond with the PRSR functions. Table 11-24 PRSR functions Bits Field Function [31:4] Reserved Do not modify on writes. On reads, the value returns zero. [3] Sticky reset status Sticky reset status bit. This bit is cleared on read. 0 = the processor has not been reset since the last time this register was read.
Debug 11.5 Management registers The Management Registers define the standardized set of registers that all CoreSight components implement. This section describes these registers. Table 11-25 shows the contents of the Management Registers for the processor debug unit. Table 11-25 Management Registers 11.5.1 Offset (hex) Register number Access Mnemonic Description 0xD00-0xDFC 832-895 R - Processor Identifier Registers. See Processor ID Registers.
Debug Table 11-26 Processor Identifier Registers (continued) 11.5.
Debug Writing b1 to a specific claim tag set bit sets that claim tag. Writing b0 to a specific claim tag bit has no effect. This register always reads 0xFF, indicating eight claim tags are implemented. Claim Tag Clear Register Figure 11-16 on page 11-33 shows the bit arrangement of the Claim Tag Set Register. 31 8 7 Reserved 0 Claim tag clear Figure 11-17 Claim Tag Clear Register format Table 11-28 shows how the bit values correspond with the Claim Tag Clear Register functions.
Debug Table 11-29 shows how the bit values correspond with the Lock Status Register functions. Table 11-29 Lock Status Register functions 11.5.5 Bits Field Function [31:3] Reserved Do not modify on writes. On reads, the value returns zero. [2] 32-bit access Indicates that a 32-bit access is required to write the key to the Lock Access Register. This bit always reads 0. [1] Locked bit Locked bit: 0 = Writes are permitted. 1 = Writes are ignored. This is the reset value.
Debug Table 11-31 shows the offset value, register number, and description that are associated with each Peripheral Identification Register.
Debug Table 11-34 shows how the bit values correspond with the Peripheral ID Register 1 functions. Table 11-34 Peripheral ID Register 1 functions Bits Value Description [31:8] - Reserved [7:4] 0xB Indicates bits [3:0] of the JEDEC JEP106 Identity Code [3:0] 0xC Indicates bits [11:8] of the Part number for the processor Table 11-35 shows how the bit values correspond with the Peripheral ID Register 2 functions.
Debug Table 11-38 shows the offset value, register number, and value that are associated with each Component Identification Register. Table 11-38 Component Identification Registers ARM DDI 0363E ID013010 Offset (hex) Register number Value Description 0xFF0 1020 0x0D Component Identification Register 0 0xFF4 1021 0x90 Component Identification Register 1 0xFF8 1022 0x05 Component Identification Register 2 0xFFC 1023 0xB1 Component Identification Register 3 Copyright © 2009 ARM Limited.
Debug 11.6 Debug events A processor responds to a debug event in one of the following ways: • ignores the debug event • takes a debug exception • enters debug state. This section describes: • Software debug event • Halting debug event on page 11-40. • Behavior of the processor on debug events on page 11-40 • Debug event priority on page 11-40 • Watchpoint debug events on page 11-40. 11.6.1 Software debug event A software debug event is any of the following: • A watchpoint debug event.
Debug 11.6.2 Halting debug event The debugger or the system can cause the processor to enter into debug state by triggering any of the following halting debug events: • assertion of the EDBGRQ signal, an External Debug Request • write to the DRCR[0] Halt Request control bit. If EDBGRQ is asserted while DBGEN is HIGH but invasive debug is not permitted, the devices asserting this signal must hold it until the processor enters debug state, that is, until DBGACK is asserted.
Debug 11.7 Debug exception The processor takes a debug exception when a software debug event occurs while in Monitor debug-mode. Prefetch Abort and Data Abort Vector catch debug events are ignored. The debug software must carefully program certain debug events to prevent the processor from entering an unrecoverable state.
Debug Table 11-40 shows the values in the link register after exceptions.
Debug 11.7.2 Avoiding unrecoverable states The processor ignores vector catch debug events on the Prefetch or Data Abort vectors while in Monitor debug-mode because these events would otherwise put the processor in an unrecoverable state.
Debug 11.8 Debug state The debug state enables an external agent, usually a debugger, to control the processor following a debug event. While in debug state, the processor behaves as follows: • The DSCR[0] core halted bit is set. • The DBGACK signal is asserted, see DBGACK on page 11-51. • The DSCR[5:2] method of entry bits are set appropriately. • The processor is halted. The pipeline is flushed and no instructions are fetched. • The processor does not change the execution mode.
Debug Table 11-41 Read PC value after debug state entry (continued) Debug event ARM Thumb Return address (RAa) meaning Vector catch RA+8 RA+4 Vector address. External debug request signal activation RA+8 RA+4 Address of the instruction where the execution resumes. Debug state entry request command RA+8 RA+4 Address of the instruction where the execution resumes. OS unlock event RA+8 RA+4 Address of the instruction where the execution resumes.
Debug 11.8.3 Executing instructions in debug state In debug state, the processor executes instructions issued through the Instruction Transfer Register (ITR). Before the debugger can force the processor to execute any instruction, it must enable this feature through DSCR[13]. While the processor is in debug state, it always decodes instructions from the ITR as per the ARM instruction set, regardless of the value of the T and J bits of the CPSR.
Debug 11.8.7 Coprocessor instructions CP14 and CP15 instructions can always be executed in debug state regardless of processor mode. 11.8.8 Effect of debug state on non-invasive debug The processor non-invasive debug features are the ETM and Performance Monitoring Unit (PMU). All of these non-invasive debug features are disabled when the processor is in debug state. For more information, see Chapter 4 System Control Coprocessor and ETM interface on page 1-11.
Debug Precise Data abort When a precise Data Abort occurs in debug state, the behavior of the processor is as follows: • PC, CPSR, SPSR_abt, and R14_abt are unchanged • the processor remains in debug state • DSCR[6], sticky precise data abort bit, is set • DFSR and DFAR are set to the same values as if the abort had occurred in normal state.
Debug 6. ARM DDI 0363E ID013010 Sets the DSCR[1] core restarted flag to 1. Copyright © 2009 ARM Limited. All rights reserved.
Debug 11.9 Cache debug This section describes cache debug. It consists of: • Cache pollution in debug state • Cache coherency in debug state • Cache usage profiling. 11.9.1 Cache pollution in debug state If bit [0] of the Debug State Cache Control Register (DSCCR) is set to 0 while the processor is in debug state, then the L1 data cache does not perform any line fill.
Debug 11.10 External debug interface The system can access memory-mapped debug registers through the processor APB slave port. This section describes the APB interface and the miscellaneous debug input and output signals: • APB signals • Miscellaneous debug signals • Authentication signals on page 11-52. 11.10.1 APB signals The APB slave port is compliant with the AMBA Advanced Peripheral Bus specification v3 and can be connected to the Debug Access Port (DAP).
Debug DBGSELFADDR The DBGSELFADDR signal specifies bits [31:12] of the offset from the debug ROM physical address to the physical address where the processor APB port is mapped to the base of the 4KB debug register map. This is a configuration input and must be tied off or only change while the processor is in reset. DBGSELFADDRV is the valid signal for DBGSELFADDR. If the offset cannot be determined, DBGSELFADDR must be tied off to zero and DBGSELFADDRV must be tied LOW.
Debug If software running on the processor has control over an external device that drives the authentication signals, it must make the change using a safe sequence: 1. Execute an implementation-specific sequence of instructions to change the signal value. For example, this might be a single STR instruction that writes certain value to a control register in a system peripheral. 2. If step1 involves any memory operation, issue a Data Synchronization Barrier (DSB) instruction. 3.
Debug 11.11 Using the debug functionality This section provides some examples of using the processor debug functionality, both from the point of view of a software engineer writing code to run on an ARM processor and of a developer creating debug tools for the processor. In the former case, examples are given in ARM assembly language. In the latter case, the examples are in C pseudo-language, intended to convey the algorithms to be used. These examples are not intended as source code for a debugger.
Debug 11.11.1 Debug communications channel There are two ways that an external debugger can send data to or receive data from the processor: • The debug communications channel, when the processor is not in debug state. It is defined as the set of resources used for communicating between the external debugger and software running on the processor. • The mechanism for forcing the processor to execute ARM instructions, when the processor is in debug state.
Debug Software access to the DCC Software running on the processor that sends data to the debugger through the target-to-host channel can use the sequence of instructions that Example 11-2 shows. Example 11-2 Target to host data transfer (target end) WriteDCC ; r0 -> word to send to the debugger MRC p14, 0, PC, c0, c1, 0 BEQ WriteDCC MCR p14, 0, Rd, c0, c5, 0 BX lr Example 11-3 shows the sequence of instructions for sending data to the debugger through the host-to-target channel.
Debug { // Step 1. Poll DSCR until DTRRXfull is clear. repeat { dscr := ReadDebugRegister(34); } until (!(dscr & (1<<30))); // Step 2. Write the value to DTRRX. WriteDebugRegister(32, dtr_val); } While the processor is running, if the DCC is used as a data channel, it might be appropriate to poll the DCC regularly. Example 11-6 shows the code for polling the DCC.
Debug For a simple breakpoint, you can program the settings for the other control bits as Table 11-43 shows: Table 11-43 Values to write to BCR for a simple breakpoint Bits Value to write Description [31:29] 0b000 Reserved [28:24] 0b00000 Breakpoint address mask [23] 0b0 Reserved [22:20] 0b000 Meaning of BVR [19:16] 0b0000 Linked BRP number [15:9] 0b00 Reserved [8:5] Derived from address Byte address select [4:3] 0b00 Reserved [2:1] 0b11 Supervisor access control [0] 0b1 Bre
Debug For a simple watchpoint, you can program the settings for the other control bits as Table 11-44 shows: Table 11-44 Values to write to WCR for a simple watchpoint Bits Value to write Description [31:29] 0b000 Reserved [28:24] 0b00000 Watchpoint address mask [23:21] 0b000 Reserved [20] 0b0 Enable linking [19:16] 0b0000 Linked BRP number [15:13] 0b00 Reserved [12:5] Derived from address Byte address select [4:3] 0b10 Load/Store access control [2:1] 0b11 Privileged access con
Debug Table 11-45 shows some examples.
Debug Example 11-10 shows the code for single-stepping off an instruction. Example 11-10 Single-stepping off an instruction SingleStepOff(uint32 address) { bkpt := FindUnusedBreakpointWithMismatchCapability(); SetComplexBreakpoint(bkpt, address, 4 << 20); } Note In Example 11-10, the third parameter of SetComplexBreakpoint() indicates the value to set BCR[22:20]. This method of single-stepping steps off the instruction that might not necessarily be the same as stepping to the next instruction executed.
Debug dscr := ReadDebugRegister(34); } until (dscr & (1<<19)); } // Step 4. Read the entire processor state. The function ReadAllRegisters // reads all general-purpose registers for all processor mode, and saves // the data in “state”. ReadAllRegisters(state); // Step 5. Based on the CPSR (processor state), determine the actual restart // address if (state->cpsr & (1<<5); { // set the T bit to Thumb state state->pc := state->pc - 4; } elseif (state->cpsr & (1<<24)) { // Set the J bit to Jazelle state.
Debug WritePC(state->pc); // Step 4. Writing the PC corrupts R0 therefore, restore R0 now. WriteRegister(0, state->r0); // Step 5. Write the restart request bit in the DRCR. WriteDebugRegister(36, 1<<1); // Step 6. Poll the RESTARTED flag in the DSCR. repeat { dscr := ReadDebugRegister(34); } until (dscr & (1<<1)); } 11.11.
Debug } Reading the PC in debug state Example 11-15 shows the code to read the PC. Example 11-15 Reading the PC ReadPC() { // Step 1. Save R0 saved_r0 := ReadRegister(0); // Step 2. Execute the instruction MOV r0, pc through the ITR. ExecuteARMInstruction(0xE1A0000F); // Step 3. Read the value of R0 that now contains the PC. pc := ReadRegister(0); // Step 4. Restore the value of R0.
Debug Example 11-17 Writing the CPSR WriteCPSR(uint32 cpsr_val) { // Step 1. Save R0. saved_r0 := ReadRegister(0); // Step 2. Write the new CPSR value to R0. WriteRegister(0, cpsr_val); // Step 3. Execute instruction MSR R0, CPSR through the ITR. ExecuteARMInstruction(0xE12FF000); // Step 4. Execute a PrefetchFlush instruction through the ITR. ExecuteARMInstruction(9xEE070F95); // Step 5. Restore the value of R0.
Debug { return false; } } Note You can use a similar sequence to read a halfword of memory and to write to memory. To read or write blocks of memory, substitute the data instruction with one that uses post-indexed addressing. For example: LDRB R1, [R0],1 This prevents reloading the address value for each sequential word. Example 11-20 shows the code for reading a block of bytes of memory.
Debug // Step 2. Write the address to R0. WriteRegister(0, address); // Step 3. Execute instruction LDC p14, c5, [R0] through the ITR. ExecuteARMInstruction(0xED905E00); // Step 4. Read the value from the DTR directly. datum := ReadDCC(); // Step 5. Restore the corrupted register R0. WriteRegister(0, saved_r0); // Step 6. Check the DSCR for a sticky abort.
Debug WriteDebugRegister(32, value); // Step 2. Write the opcode for MRC p14, 0, Rd, c5, c0 to the ITR. // Write stalls until the ITR is ready. WriteDebugRegister(33, 0xEE100E15 + (Rd<<12)); } Note To transfer a register to the processor when in stall mode, you are not required to poll the DSCR each time an instruction is written to the ITR and a value read from or written to the DTR.
Debug } Example 11-26 shows the sequence for writing a block of words to memory. Example 11-26 Writing a block of words to memory (fast download) WriteWords(uint32 address, bool &aborted, uint32 *data, int nwords) { // Step 1. Save the value of R0. saved_r0 := ReadRegister(0); // Step 2. Write the value 0b10 to DSCR[21:20] for fast mode. SetDTRAccessMode(2); // Step 3. Write the opcode for MRC p14, 0, R0, c5, c0 to the ITR. // Write stalls until the ITR is ready but the instruction is not issued.
Debug // Step 1. Save R0. saved_r0 := ReadRegister(0); // Step 2. Execute instruction MCR p15, 0, R0, c0, c1, 0 through the ITR. ExecuteARMInstruction(0xEE000010 + (CPnum<<8) + (opc1<<21) + (CRn<<16) + CRm // Step 3. Read the value of R0 that now contains the CP register. CP15c1 := ReadRegister(0); // Step 4. Restore the value of R0. WriteRegister(0, saved_r0); return CP15c1; + (opc2<<5)); } ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Debug 11.12 Debugging systems with energy management capabilities The processor offers functionality for debugging systems with energy-management capabilities. This section describes scenarios where the OS takes energy-saving measures when in an idle state. The different measures that the OS can take to save energy during an idle state are divided into two groups: Standby The OS takes measures that reduce energy consumption but maintain the processor state.
Debug • ARM DDI 0363E ID013010 Attaching the debugger for a postmortem debug session is not possible because setting the DBGNOPWRDWN signal to 1 might not cause the processor to power up. The effect of setting DBGNOPWRDWN to 1 when the processor is already powered down is implementation-defined, and is up to the system designer. Copyright © 2009 ARM Limited. All rights reserved.
Chapter 12 FPU Programmer’s Model This chapter describes the programmer’s model of the Floating Point Unit (FPU). The Cortex-R4F processor is a Cortex-R4 processor that includes the optional FPU. In this chapter, the generic term processor means only the Cortex-R4F processor.
FPU Programmer’s Model 12.1 About the FPU programmer’s model The FPU implements the VFPv3-D16 architecture and the Common VFP Sub-Architecture v2. This includes the instruction set of the VFPv3 architecture. See the ARM Architecture Reference Manual for information on the VFPv3 instruction set. 12.1.1 FPU functionality The FPU is an implementation of the ARM Vector Floating Point v3 architecture, with 16 double-precision registers (VFPv3-D16).
FPU Programmer’s Model 12.2 General-purpose registers The FPU implements a VFP register bank. This bank is distinct from the ARM register bank. You can reference the VFP register bank using two explicitly aliased views. Figure 12-1 shows the two views of the register bank and the way the word and doubleword registers overlap. 12.2.1 FPU views of the register bank In the FPU, you can view the register bank as: • Sixteen 64-bit doubleword registers, D0-D15.
FPU Programmer’s Model 12.3 System registers The VFPv3 architecture describes the following system registers: • Floating-Point System ID Register, FPSID on page 12-5 • Floating-Point Status and Control Register, FPSCR on page 12-6 • Floating-Point Exception Register, FPEXC on page 12-7 • Media and VFP Feature Registers, MVFR0 and MVFR1 on page 12-8. Table 12-1 shows the VFP system registers in the Cortex-R4F FPU.
FPU Programmer’s Model Note All hardware ID information is privileged access only: FPSID is privileged access only This is a change in VFPv3 compared to VFPv2. MVFR registers are privileged access only User code must issue a system call to determine the features that are supported.
FPU Programmer’s Model 12.3.2 Floating-Point Status and Control Register, FPSCR FPSCR is a read/write register that can be accessed in both Privileged and nonprivileged modes. All bits described as DNM in Figure 12-3 are reserved for future expansion. These bits must be initialized to zeros. To ensure that these bits are not modified, any code other than initialization code must use read-modify-write techniques when writing to FPSCR.
FPU Programmer’s Model Table 12-4 FPSCR Register bit functions (continued) 12.3.
FPU Programmer’s Model Table 12-5 shows how the bit values correspond with the FPEXC Register functions. Table 12-5 Floating-Point Exception Register bit functions 12.3.4 Bits Field Function [31] Reserved RAZ. [30] EN VFP enable bit. Setting EN enables VFP functionality. Reset clears EN. [29] DEX Set when an Undefined exception is taken because of a vector instruction that would have been executed if the processor supported vectors.
FPU Programmer’s Model 31 20 19 Reserved 16 15 SP 12 11 I 8 7 LS 4 3 DN 0 FZ Figure 12-6 MVFR1 Register format Table 12-7 shows how the bit values correspond with the MVFR1 Register.
FPU Programmer’s Model 12.4 Modes of operation The FPU provides three modes of operation to accommodate a variety of applications: • Full-compliance mode • Flush-to-zero mode • Default NaN mode 12.4.1 Full-compliance mode In full-compliance mode, the FPU processes all operations according to the IEEE 754 standard in hardware. 12.4.2 Flush-to-zero mode Setting the FZ bit, FPSCR[24], enables flush-to-zero mode.
FPU Programmer’s Model 12.5 Compliance with the IEEE 754 standard When Default NaN (DN) and Flush-to-Zero (FZ) modes are disabled, the VFP functionality is compliant with the IEEE 754 standard in hardware. No support code is required to achieve this compliance. See the ARM Architecture Reference Manual for information about VFP architecture compliance with the IEEE 754 standard. 12.5.
FPU Programmer’s Model • In default NaN mode, arithmetic CDP instructions involving NaN operands return the default NaN regardless of the fractions of any NaN operands. SNaNs in an arithmetic CDP operation set the IOC flag, FPSCR[0]. NaN handling by data transfer and non-arithmetic CDP instructions is the same as in full-compliance mode. Table 12-9 summarizes the effects of NaN operands on instruction execution.
FPU Programmer’s Model 12.5.3 Exceptions The FPU implements the VFPv3 architecture and sets the cumulative exception status flag in the FPSCR register as required for each instruction. The FPU does not support user-mode traps. The exception enable bits in the FPSCR read-as-zero, and cannot be written. The processor also has six output pins, FPIXC, FPUFC, FPOFC, FPDZC, FPIDC, and FPIOC, that each reflect the status of one of the cumulative exception flags.
Chapter 13 Integration Test Registers This chapter describes how to use the Integration Test Registers in the processor. It contains the following sections: • About Integration Test Registers on page 13-2 • Programming and reading Integration Test Registers on page 13-3 • Summary of the processor registers used for integration testing on page 13-4 • Processor integration testing on page 13-5. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Integration Test Registers 13.1 About Integration Test Registers The processor contains Integration Test Registers that enable you to verify integration of the design and enable topology detection of the design using debug tools. The Integration Mode Control Register (ITCTRL), which is also described in this chapter, controls the use of the Integration Test Registers. When programming the Integration Test Registers you must enable all the changes at the same time.
Integration Test Registers 13.2 Programming and reading Integration Test Registers The Integration Test Registers are programmed using the debug APB interface. For more information on using the debug APB interface see Chapter 11 Debug. 13.2.1 Software access using APB APB provides a direct method of programming: • a stand-alone macrocell • a macrocell in a CoreSight system. APB provides access to the programmable control registers of peripheral devices.
Integration Test Registers 13.3 Summary of the processor registers used for integration testing Table 13-1 lists the processor Integration Test Registers and the Integration Mode Control Register (ITCTRL).
Integration Test Registers 13.4 Processor integration testing This section describes the behavior and use of the Integration Test Registers that are in the processor. It also describes the Integration Mode Control Register that controls the use of the Integration Test Registers. For more information about the ITCTRL see the ARM Architecture Reference Manual. If you want to access these registers you must first set bit [0] of the Integration Mode Control Register to 1.
Integration Test Registers Table 13-3 Input signals that can be read by the Integration Test Registers Signal Register Bit Register description DBGRESTART ITMISCIN [11] See ITMISCIN Register (Miscellaneous Inputs) on page 13-8 ETMEXTOUT[1:0] ITMISCIN [9:8] nETMWFIREADY ITMISCIN [5] nIRQ ITMISCIN [2] nFIQ ITMISCIN [1] EDBGRQ ITMISCIN [0] This section describes: • Using the Integration Test Registers • Performing integration testing • ITETMIF Register (ETM interface) on page 13-7 • ITM
Integration Test Registers 13.4.3 ITETMIF Register (ETM interface) The ITETMIF Register at offset 0xED8 is write-only. Figure 13-1 shows the register bit assignments. 31 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved EVNTBUS[46] EVNTBUS[28] EVNTBUS[0] ETMCID[31] ETMCID[0] ETMDD[63] ETMDD[0] ETMDA[31] ETMICTL[0] ETMICTL[13] ETMIA[1] ETMIA[31] ETMDCTL[0] ETMDCTL[11] ETMDA[0] Figure 13-1 ITETMIF Register bit assignments Table 13-4 shows the fields when writing the ITETMIF Register.
Integration Test Registers 13.4.4 ITMISCOUT Register (Miscellaneous Outputs) The ITMISCOUT Register at offset 0xEF8 is write-only. Figure 13-2 shows the register bit assignments. 10 9 8 7 6 5 4 3 2 1 0 31 Reserved DBGRESTARTED DBGTRIGGER Reserved ETMWFIPENDING nPMUIRQ Reserved COMMTX COMMRX DBGACK Figure 13-2 ITMISCOUT Register bit assignments Table 13-5 shows the fields when writing the ITMISCOUT Register. When this register is written the appropriate output pins take the value written.
Integration Test Registers 31 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved DBGRESTART Reserved ETMEXTOUT[1:0] Reserved nETMWFIREADY Reserved nFIQ nIRQ EDBGRQ Figure 13-3 ITMISCIN Register bit assignments Table 13-6 lists the register bit assignments for the ITMISCIN Register. Table 13-6 ITMISCIN Register bit assignments 13.4.6 Bits Name Function [31:12] - Reserved. Read Undefined. [11] DBGRESTART Read value of the DBGRESTART input pin. [10] - Reserved. Read Undefined.
Integration Test Registers Table 13-7 shows the fields of the ITCTRL Register. Table 13-7 ITCTRL Register bit assignments Bits Access Reset value Name Function [31:1] RAZ/SBZP - - Reserved. [0] R/W 0 INTMODE Controls whether the processor is in normal operating mode or integration mode: b0 = normal operation b1 = integration mode enabled.
Chapter 14 Cycle Timings and Interlock Behavior This chapter describes the cycle timings and interlock behavior of instructions on the processor.
Cycle Timings and Interlock Behavior • • ARM DDI 0363E ID013010 Floating-point double-precision data processing instructions on page 14-33 Dual issue on page 14-34. Copyright © 2009 ARM Limited. All rights reserved.
Cycle Timings and Interlock Behavior 14.1 About cycle timings and interlock behavior Complex instruction dependencies and memory system interactions make it impossible to describe briefly the exact cycle timing behavior for all instructions in all circumstances. The timings described in this chapter are accurate in most cases. If precise timings are required, you must use a cycle-accurate model of the processor.
Cycle Timings and Interlock Behavior ADD R3, R3, R1 LSL#6 ;plus one because Register R1 is Early The following sequence where R1 is a Late Reg takes two cycles: LDR R1, [R2] STR R1, [R3] ;Result latency two minus one cycles ;no penalty because R1 is a Late register The following sequence where R1 is a Very Early Reg takes four cycles: ADD R3, R1, R2 LDR R4, [R3] 14.1.
Cycle Timings and Interlock Behavior Table 14-1 Definition of cycle timing terms (continued) 14.1.5 Term Description Early Reg The specified registers are required at the start of the Ex1 stage. Add one cycle to the Result Latency of the instruction producing this register. Very Early Reg The specified registers are required at the start of the Iss stage.
Cycle Timings and Interlock Behavior 14.2 Register interlock examples Table 14-2 shows register interlock examples using LDR and ADD instructions. LDR instructions take one cycle, have a result latency of two, and require their base register as a Very Early Reg. ADD instructions take one cycle and have a result latency of one. Table 14-2 Register interlock examples Instruction sequence Behavior LDR R1, [R2] ADD R6, R5, R4 Takes two cycles because there are no register dependencies.
Cycle Timings and Interlock Behavior 14.3 Data processing instructions This section describes the cycle timing behavior for the ADC, ADD, ADDW, AND, ASR, BIC, CLZ, CMN, CMP, EOR, LSL, LSR, MOV, MOVT, MOVW, MVN, ORN, ORR, ROR, RRX, RSB, RSC, SBC, SUB, SUBW, TEQ, and TST instructions. This section describes: • Cycle counts if destination is not PC • Cycle counts if destination is the PC • Example interlocks on page 14-8 14.3.
Cycle Timings and Interlock Behavior 14.3.3 Example interlocks Most data processing instructions are single-cycle and can be executed back-to-back without interlock cycles, even if there are data dependencies between them. The exceptions to this are when shifts are used. Shifter The registers that the shifter requires are Early Regs and require an additional cycle of result availability before use.
Cycle Timings and Interlock Behavior 14.4 QADD, QDADD, QSUB, and QDSUB instructions This section describes the cycle timing behavior for the QADD, QDADD, QSUB, and QDSUB instructions. These instructions perform saturating arithmetic. They have a result latency of two. The QDADD and QDSUB instructions must double and saturate the register before the addition. This register is an Early Reg. Table 14-5 shows the cycle timing behavior for QADD, QDADD, QSUB, and QDSUB instructions.
Cycle Timings and Interlock Behavior 14.5 Media data-processing Table 14-6 shows media data-processing instructions and gives their cycle timing behavior. All media data-processing instructions are single-cycle issue instructions. These instructions have result latencies of one or two cycles. Some of the instructions require an input register to be shifted, or manipulated in some other way before use and therefore are marked as requiring an Early Reg.
Cycle Timings and Interlock Behavior 14.6 Sum of Absolute Differences (SAD) Table 14-7 shows SAD instructions and gives their cycle timing behavior. Table 14-7 Sum of absolute differences instruction timing behavior Instructions Cycles Early Reg Result latency USAD8 1 , 2a USADA8 1 , 2a a. Result latency is one fewer if the destination is the accumulate for a subsequent USADA8. 14.6.
Cycle Timings and Interlock Behavior 14.7 Multiplies Most multiply operations cannot forward their result early, except as the accumulate value for a subsequent multiply. For a subsequent multiply accumulate the result is available one cycle earlier than for all other uses of the result. Certain multiplies require: • more than one cycle to execute • more than one pipeline issue to produce a result.
Cycle Timings and Interlock Behavior Table 14-9 Example multiply instruction cycle timing behavior (continued) Example instruction Cycles Early Reg Late Reg Result latency SMLALD, SMLALDX 1 , - 2, 2 SMLSLD, SMLSLDX 1 , - 2, 2 UMAAL 2 , , 3, 3 Note Result Latency is one less if the result is used as the accumulate value for a subsequent multiply accumulate.
Cycle Timings and Interlock Behavior 14.8 Divide This section describes the cycle timing behavior of the UDIV and SDIV instructions. The divider unit is separate to the main execute pipeline so the UDIV and SDIV instructions require one cycle to issue. They execute out-of-order relative to the rest of the pipeline, and require an additional issue cycle at the end of the divide operation to write the result to the destination register.
Cycle Timings and Interlock Behavior 14.9 Branches This section describes the cycle timing behavior for the B, BL, BLX, BX, BXJ, CBNZ, CBZ, TBB, and TBH instructions. Branches are subject to dynamic and return stack predictions. Table 14-10 shows example branch instructions and their cycle timing behavior.
Cycle Timings and Interlock Behavior 14.10 Processor state updating instructions This section describes the cycle timing behavior for the MSR, MRS, CPS, and SETEND instructions. Table 14-11 shows processor state updating instructions and their cycle timing behavior.
Cycle Timings and Interlock Behavior 14.11 Single load and store instructions This section describes the cycle timing behavior for LDR, LDRHT, LDRSBT, LDRSHT, LDRT, LDRB, LDRBT, LDRSB, LDRH, LDRSH, STR, STRT, STRB, STRBT, STRH, and PLD instructions. Table 14-12 shows the cycle timing behavior for stores and loads, other than loads to the PC. You can replace LDR with any of these single load or store instructions. The following rules apply: • They are normally single-cycle issue.
Cycle Timings and Interlock Behavior Table 14-13 Cycle timing behavior for loads to the PC (continued) Example instruction Cycles Memory cycles Result latency LDR pc, [sp, #] (!) 8 1 - LDR pc, [sp], #cns 8 1 - LDR pc, a 9 1 - - LDR pc, a 11 1 - - Comments Conditional predicted incorrectly, but return stack predicted correctly a. See Table 14-14 for an explanation of and .
Cycle Timings and Interlock Behavior LDR R6, [R2, #0X10]! LDR R7, [R2, #0X20]! ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Cycle Timings and Interlock Behavior 14.12 Load and Store Double instructions This section describes the cycle timing behavior for the LDRD and STRD instructions. The LDRD and STRD instructions: • Are normally single-cycle issue. Both the base and any offset register are Very Early Regs. • Are 3-cycle issue if offset or pre-increment addressing with a negative register offset is used. Both the base and any offset register are Very Early Regs.
Cycle Timings and Interlock Behavior 14.13 Load and Store Multiple instructions This section describes the cycle timing behavior for the LDM, STM, PUSH, and POP instructions. These instructions take multiple cycles to issue, and then use multiple memory cycles to load and store all the registers. Because the memory datapath is 64-bits wide, two registers can be loaded or stored on each cycle.
Cycle Timings and Interlock Behavior 14.13.2 Load Multiples, where the PC is in the register list The processor includes a 4-entry return stack that can predict procedure returns. Any LDM to the PC that does not restore the SPSR to the CPSR, is predicted as a procedure return. In all cases the base register, , is a Very Early Reg. Table 14-18 shows the cycle timing behavior of Load Multiples, where the PC is in the register list.
Cycle Timings and Interlock Behavior PUSH {R1-R7} ADD R10,R10,R7 Note In the examples, R0 and sp are 64-bit aligned addresses. The instructions PUSH and POP always use the sp register for the base address. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Cycle Timings and Interlock Behavior 14.14 RFE and SRS instructions This section describes the cycle timing for the RFE and SRS instructions. These instructions: • return from an exception and save exception return state respectively • take one or two memory cycles depending on doubleword alignment first address location. In all cases the base register is a Very Early Reg. Table 14-19 shows the cycle timing behavior for RFE and SRS instructions.
Cycle Timings and Interlock Behavior 14.15 Synchronization instructions This section describes the cycle timing behavior for the CLREX, DMB, DSB, ISB, LDREX, LDREXB, LDREXD, LDREXH, STREX, STREXB, STREXD, STREXH, SWP, and SWPB instructions In all cases the base register, Rn, is a Very Early Reg. Table 14-20 shows the synchronization instructions cycle timing behavior.
Cycle Timings and Interlock Behavior 14.16 Coprocessor instructions This section describes the cycle timing behavior for the MCR and MRC instructions to CP14, the debug coprocessor or CP15, the system control coprocessor. The precise timing of coprocessor instructions is tightly linked with the behavior of the relevant coprocessor. Table 14-21 shows the coprocessor instructions cycle timing behavior. Table 14-21 shows the best case numbers.
Cycle Timings and Interlock Behavior 14.17 SVC, BKPT, Undefined, and Prefetch Aborted instructions This section describes the cycle timing behavior for SVC, Undefined instruction, BKPT and Prefetch Abort. In all cases the exception is taken in the Wr stage of the pipeline. SVC and most Undefined instructions that fail their condition codes take one cycle. A small number of Undefined instructions that fail their condition codes take two cycles.
Cycle Timings and Interlock Behavior 14.18 Miscellaneous instructions Table 14-23 shows the cycle timing behavior for If-Then (IT) and No OPeration (NOP) instructions. Table 14-23 IT and NOP instructions cycle timing behavior Example instructions Cycles Early Reg Late Reg Result latency Comments IT{{{}}} 1 - - - - NOP 1 - - - - The DBG, PLI, SEV, WFE, and YIELD instructions are all treated the same as NOP, and so have the same cycle timing behavior.
Cycle Timings and Interlock Behavior 14.19 Floating-point register transfer instructions This section describes the cycle timing behavior for the various VFP instruction which transfer data between the VFP register file and the integer register file, including the system registers. All source operands are Normal Regs, and the result latency for non-system register transfers is always 1 cycle.
Cycle Timings and Interlock Behavior 14.20 Floating-point load/store instructions This section describes the cycle timing behavior for all load and store instructions that operate on the VFP register file: • The base address register, and any offset register are Very Early Regs for both loads and stores. • For store instructions, the data register (Sd or Dd), or registers are always Late Regs. • The cycle timing of load and store instructions is affected by the starting address for the transfer.
Cycle Timings and Interlock Behavior Table 14-25 Floating-point load/store instructions cycle timing behavior (continued) Cycles with writeback (!) Result latency (load) Result latency (base register, ) Comments 1 1 1 1 - VLDM{mode}.32 {!}, {s1,s2} 2 2 1,2 2 - VLDM{mode}.32 {!}, {s1-s3} 2 3 1,2,2 3 - VLDM{mode}.32 {!}, {s1-s4} 3 3 1,2,2,3 3 - VLDM{mode}.64 {!}, {d1} 2 2 2 2 - VLDM{mode}.64 {!}, {d1,d2} 3 3 2,3 3 - VLDM{mode}.
Cycle Timings and Interlock Behavior 14.21 Floating-point single-precision data processing instructions This section describes the cycle timing behavior for all single-precision VFP CDP instructions. This includes arithmetic instructions such as VMUL.F32, data and immediate moving instructions such as “VMOV.F32 , #”, VABS.F32, VNEG.F32, and “VMOV , ”, and comparison instructions and conversion instructions.
Cycle Timings and Interlock Behavior 14.22 Floating-point double-precision data processing instructions This section describes the cycle timing behavior for all double-precision VFP CDP instructions. This includes arithmetic instructions such as VMUL.F64, data and immediate moving instructions such as “VMOV.F64
- , #”, VABS.F64, VNEG.F64, and “VMOV
- , ”, and comparison instructions and conversion instructions.
Cycle Timings and Interlock Behavior 14.23 Dual issue To increase instruction throughput, the processor can issue certain pairs of instructions simultaneously. This is called dual issue. When this happens, the instruction with the smaller cycle count is assumed to execute in zero cycles. If a pair of instructions can be dual-issued, they are always dual-issued unless dual-issuing is disabled, see Auxiliary Control Registers on page 4-38. If one instruction of the pair is interlocked, both are interlocked.
Cycle Timings and Interlock Behavior 14.23.2 Permitted combinations Table 14-28 lists the permitted instruction combinations. Any instruction can be conditional or flag-setting unless otherwise stated. Only the exact instruction combinations listed in Table 14-28 can be dual issued, provided you ensure the instruction combinations obey the rules specified in Dual issue rules on page 14-34.
Cycle Timings and Interlock Behavior Table 14-28 Permitted instruction combinations (continued) Dual issue case First instruction Second instruction Case F2_stb VSTR.F32n As for Case B1. Any single-precision CDPi, excluding multiply-accumulate instructionso. 32-bit transfers to and from the floating-point register filel. Case F2Db VLDR.F64n As for Case B1. Case F3b 32-bit transfers to and from the floating-point register filel "VMOV.F32 , , ", VABS.F32, and VNEG.F32.
Chapter 15 AC Characteristics This chapter gives the timing parameters for the processor. It contains the following sections: • Processor timing on page 15-2 • Processor timing parameters on page 15-3. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
AC Characteristics 15.1 Processor timing The AXI bus interface of the processor conforms to the AMBA AXI Specification. For the relevant timing of the AXI write and read transfers, and the error response, see the AMBA AXI Protocol v1.0 Specification. The APB debug interface of the processor conforms to the AMBA 3 APB Protocol v1.0 Specification. For the relevant timing of the APB write and read transfers, and the error response, see the AMBA 3 APB Protocol v1.0 Specification.
AC Characteristics 15.2 Processor timing parameters This section describes the input and output port timing parameters for the processor. The maximum timing parameter or constraint delay for each processor signal applied to the SoC is given as a percentage in Table 15-1 to Table 15-17 on page 15-11. The input and output delay columns provide the maximum and minimum time as a percentage of the processor clock cycle given to the SoC for that signal.
AC Characteristics Table 15-2 Configuration input port timing parameters (continued) Input delay minimum Input delay maximum Signal name Clock uncertainty 20% PARLVRAM Clock uncertainty 20% ENTCM1IF Clock uncertainty 20% SLBTCMSB Clock uncertainty 20% RMWENRAM[1:0] Table 15-3 shows the timing parameters for the interrupt input ports.
AC Characteristics Table 15-4 AXI master input port timing parameters (continued) Input delay minimum Input delay maximum Signal name Clock uncertainty 60% RVALIDM Clock uncertainty 60% BPARITYM Clock uncertainty 60% RPARITYM Table 15-5 shows the input timing parameters for the AXI slave port.
AC Characteristics Table 15-5 AXI slave input port timing parameters (continued) Input delay minimum Input delay maximum Signal name Clock uncertainty 60% AWPARITYS Clock uncertainty 60% WPARITYS Clock uncertainty 60% ARPARITYS Table 15-6 shows the input timing parameters for the debug input ports.
AC Characteristics Table 15-8 shows the timing parameters for the test input ports.
AC Characteristics Table 15-9 TCM interface input ports timing parameters (continued) Input delay minimum Input delay maximum Signal name Clock uncertainty 50% B1TCWAIT Clock uncertainty 40% B1TCLATEERROR Clock uncertainty 50% B1TCRETRY The timing parameters for the dual-redundant core compare logic input control buses, DCCMINP[7:0] and DCCMINP2[7:0], are implementation-defined. Contact the implementer of the macrocell you are working with. 15.2.
AC Characteristics Table 15-12 AXI master output port timing parameters (continued) Output delay minimum Output delay maximum Signal name Clock uncertainty 60% AWPROTM[2:0] Clock uncertainty 60% AWUSERM[4:0] Clock uncertainty 60% AWVALIDM Clock uncertainty 60% WIDM[3:0] Clock uncertainty 60% WDATAM[63:0] Clock uncertainty 60% WSTRBM[7:0] Clock uncertainty 60% WLASTM Clock uncertainty 60% WVALIDM Clock uncertainty 60% BREADYM Clock uncertainty 60% ARIDM[3:0] Clock uncertaint
AC Characteristics Table 15-13 AXI slave output ports timing parameters (continued) Output delay minimum Output delay maximum Signal name Clock uncertainty 60% BRESPS[1:0] Clock uncertainty 60% BVALIDS Clock uncertainty 60% ARREADYS Clock uncertainty 60% RIDS[7:0] Clock uncertainty 60% RDATAS[63:0] Clock uncertainty 60% RRESPS[1:0] Clock uncertainty 60% RLASTS Clock uncertainty 60% RVALIDS Clock uncertainty 60% BPARITYS Clock uncertainty 60% RPARITYS Clock uncertainty 50%
AC Characteristics Table 15-15 shows the timing parameters for the ETM interface output ports.
AC Characteristics Table 15-17 TCM interface output ports timing parameters (continued) Output delay minimum Output delay maximum Signal name Clock uncertainty 45% ATCADDRPTY Clock uncertainty 45% B0TCEN0 Clock uncertainty 45% B0TCEN1 Clock uncertainty 45% B0TCADDR[22:3] Clock uncertainty 45% B0TCBYTEWR[7:0] Clock uncertainty 45% B0TCSEQ Clock uncertainty 45% B0TCDATAOUT[63:0] Clock uncertainty 45% B0TCPARITYOUT[13:0] Clock uncertainty 45% B0TCACCTYPE[2:0] Clock uncertainty 4
AC Characteristics The timing parameters for the dual-redundant core compare logic output buses, DCCMOUT[7:0] and DCCMOUT2[7:0], are implementation-defined. Contact the implementer of the macrocell you are working with. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Appendix A Processor Signal Descriptions This appendix describes the processor signals.
Processor Signal Descriptions A.1 About the processor signal descriptions The tables in this appendix list the processor signals, along with their dimensions and direction, input or output, and a high-level description. Each table also has a clocking column, that indicates by which clock a signal is sampled or driven. All signals are sampled on or driven from the rising edge of the clock.
Processor Signal Descriptions A.2 Global signals Table A-1 shows the processor global signals. The free clock is ungated, with minimal insertion delay, because it clocks the clock gating circuits. Therefore, you must ensure that incoming clocks are balanced with the free clock. Table A-1 Global signals Signal Direction Clocking Description FREECLKIN Input - Free version of the core clock. CLKIN Input - Core clock.
Processor Signal Descriptions A.3 Configuration signals Table A-2 shows the processor configuration signals. Table A-2 Configuration signals Signal Direction Clocking Description VINITHI Input Tie-off, Reset Reset V-bit value. When HIGH indicates HIVECS mode at reset. See c1, System Control Register on page 4-35 for more information. CFGEE Input Tie-off, Reset Reset EE-bit value. When HIGH indicates the implementation uses BE-8 mode for exceptions at reset.
Processor Signal Descriptions Table A-2 Configuration signals (continued) Signal Direction Clocking Description CFGBTCMSZ[3:0] Input Tie-off Selects the BTCM size. The encodings for the TCM sizes are: b0000 = 0KB b0011 = 4KB b0100 = 8KB b0101 = 16KB b0110 = 32KB b0111 = 64KB b1000 = 128KB b1001 = 256KB b1010 = 512KB b1011 = 1MB b1100 = 2MB b1101 = 4MB b1110 = 8MB. CFGNMFI Input Tie-off, Reset When HIGH, enable non-maskable Fast Interrupts. Reflected in the NMFI bit.
Processor Signal Descriptions Table A-2 Configuration signals (continued) Signal Direction Clocking Description ERRENRAM[2:0] Input Tie-off, Reset TCMs external error enable. Tie each bit high to enable the external error signals for each TCM at reset. Use the following values: 2: B1TCM 1: B0TCM 0: ATCM See Auxiliary Control Registers on page 4-38 for more information. RMWENRAM[1:0]b Input Tie-off, Reset RMW enable bits reset values.
Processor Signal Descriptions A.4 Interrupt signals, including VIC interface signals Table A-3 shows the Interrupt signals including signals used on the VIC interface. Table A-3 Interrupt signals Signal Direction Clocking Description nFIQ Input CLKINa Anyb Fast interruptc. nIRQ Input CLKINa Anyb Normal interruptc. INTSYNCEN Input Tie-off Tie HIGH if the interrupt inputs are asynchronous to CLKIN. Tie LOW if the interrupt inputs are synchronous to CLKIN.
Processor Signal Descriptions A.5 L2 interface signals This section describes the processor L2 interface AXI signals. For more information on Advanced Microcontroller Bus Architecture (AMBA) AXI signals see the AMBA AXI Protocol Specification. Note All the outputs listed in this section have their reset values during standby. A.5.1 AXI master port Table A-4 shows the AXI master port signals for the L2 interface.
Processor Signal Descriptions Table A-4 AXI master port signals for the L2 interface (continued) Signal Direction Clocking Description Output CLKIN Indicates address and control are valid. WDATAM[63:0] Output CLKIN Write data. WIDM[3:0] Output CLKIN The identification tag for the write data group of signals. WLASTM Output CLKIN Indicates the last data transfer of a burst.
Processor Signal Descriptions Table A-4 AXI master port signals for the L2 interface (continued) Signal Direction Clocking Description ARUSERM[4:0] Output CLKIN Provides decode information for the read address channel. See Table 9-3 on page 9-5 for information about the encoding of this signal. ARVALIDM Output CLKIN Indicates address and control are valid. RDATAM[63:0] Input CLKIN Read Data. RIDM[3:0] Input CLKIN The identification tag for the read data group of signals.
Processor Signal Descriptions Table A-6 AXI slave port signals for the L2 interface (continued) Signal Direction Clocking Description AWBURSTS[1:0] Input CLKIN Write burst type. AWIDS[7:0] Input CLKIN The identification tag for the write address group of signals. AWLENS[3:0] Input CLKIN Write transfer burst length. The transfer burst length range is from one to 16. A four bit binary value minus one determines the transfer burst length.
Processor Signal Descriptions Table A-6 AXI slave port signals for the L2 interface (continued) Signal Direction Clocking Description ARUSERS[3:0] Input CLKIN Memory type select {data cache, instruction cache, BTCM or ATCM}, one hot. AWUSERS[3:0] signal is not part of the standard AXI specification. ARVALIDS Input CLKIN Indicates address and control are valid. RDATAS[63:0] Output CLKIN Read data. RIDS[7:0] Output CLKIN The identification tag for the read data group of signals.
Processor Signal Descriptions A.6 TCM interface signals Table A-8 shows the ATCM port signals.
Processor Signal Descriptions Table A-9 B0TCM port signals (continued) Name Direction Clocking Description B0TCLATEERROR Input CLKIN Late error from B0TCMa B0TCRETRY Input CLKIN Access to B1TCM must be retrieda B0TCADDRPTY Output CLKIN Parity formed from B0TCM address outputb B0TCWE Output CLKIN Write enable for B0TCM B0TCEN0 Output CLKIN Enable for B0TCM lower word, bit range [31:0] B0TCEN1 Output CLKIN Enable for B0TCM upper word, bit range [64:32] B0TCADDR [22:3] Output CLK
Processor Signal Descriptions Table A-10 B1TCM port signals (continued) Name Direction Clocking Description B1TCADDR [22:3] Output CLKIN Address for B1TCM data RAM B1TCBYTEWR [7:0] Output CLKIN Byte strobes for direct write B1TCSEQ Output CLKIN B1TCM RAM access is sequential B1TCDATAOUT [63:0] Output CLKIN Write data for B1TCM data RAM B1TCPARITYOUT [13:0] Output CLKIN Write parity or ECC code for B1TCM B1TCACCTYPE[2:0] Output CLKIN Determines access type: b001 = Load/Store b010
Processor Signal Descriptions A.7 Dual core interface signals Table A-11 shows the dual redundant core interface signals. Table A-11 Dual core interface signals Signal Direction Clocking Description DCCMINP[7:0] Input -a Dual core compare logic input control bus DCCMOUT[7:0] Output -a Dual core compare logic output control bus DCCMINP2[7:0] Input -a Dual core compare logic extra input control busb DCCMOUT2[7:0] Output -a Dual core compare logic extra output control busb a.
Processor Signal Descriptions A.8 Debug interface signals Table A-12 shows the debug interface signals. With the exception of PCLKDBG, PCLKENDBG and PRESETDBGn, all these signals are only sampled or driven on PCLKDBG edges when PCLKENDBG is asserted. Table A-12 Debug interface signals Signal Direction Clocking Description PCLKDBG Input - Debug clock. PCLKENDBG Input PCLKDBG Clock enable for PCLKDBG. PSELDBG Input PCLKDBG Selects the external debug interface.
Processor Signal Descriptions Table A-13 Debug miscellaneous signals (continued) Name Direction Clocking Description DBGROMADDRV Input Tie-off Debug ROM physical address valid DBGSELFADDR[31:12] Input Tie-off Debug self-address offset DBGSELFADDRV Input Tie-off Debug self-address offset valid a. Not available in r0px revisions of the processor. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Processor Signal Descriptions A.9 ETM interface signals Table A-14 shows the ETM interface signals.
Processor Signal Descriptions A.10 Test signals Table A-15 shows the test signals. Table A-15 Test signals Signal Direction Clocking Description SE Input -a Scan Enable RSTBYPASS Input -a Bypass pipelined reset a. Design for test only. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Processor Signal Descriptions A.11 MBIST signals Table A-16 shows the MBIST signals.
Processor Signal Descriptions A.12 Validation signals Table A-17 shows the validation signals. Table A-17 Validation signals ARM DDI 0363E ID013010 Signal Direction Clocking Description VALEDBGRQ Output CLKIN Debug request nVALIRQ Output CLKIN Request for an interrupt nVALFIQ Output CLKIN Request for a Fast Interrupt nVALRESET Output CLKIN Request for a reset Copyright © 2009 ARM Limited. All rights reserved.
Processor Signal Descriptions A.13 FPU signals Table A-18 shows the FPU signals. These signals are only driven if the processor is configured to include the floating-point logic.
Appendix B ECC Schemes This appendix describes some of the advantages and disadvantages of the different Error Checking and Correction (ECC) schemes for the TCMs. It contains the following section: • ECC scheme selection guidelines on page B-2. ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
ECC Schemes B.1 ECC scheme selection guidelines When deciding to implement a Cortex-R4 processor with an ECC scheme on one or both of the TCM interfaces, give careful consideration between using 32-bit or 64-bit ECC. To calculate or check the ECC code for data, the processor must know the value of all bytes in the data chunk protected by the scheme. Therefore, when using these schemes, the processor must perform additional read accesses to calculate and check the ECC code stored with the data.
Appendix C Revisions This appendix describes the technical changes between released issues of this book.
Revisions Table C-1 Differences between issue B and issue C (continued) Change Location Updated reset value information for: • Cache Type Register • MPU Type Register • Instruction Set Attributes Register 1 • Instruction Set Attributes Register 4 • Current Cache Size Identification Register • Current Cache Level ID Register • MPU Region Base Address Registers • MPU Region Size and Enable Register • MPU Region Access Control Register • MPU Memory Region Number • ATCM Region Register • BTCM Region Register
Revisions Table C-1 Differences between issue B and issue C (continued) Change Location Added section Dormant mode on page 10-3 Updated the permitted instruction combinations Table 14-28 on page 14-35 Updated the descriptions for COMMRX and COMMTX signals Table A-13 on page A-17 Table C-2 Differences between issue C and issue D Change Location No technical changes - ARM DDI 0363E ID013010 Copyright © 2009 ARM Limited. All rights reserved.
Glossary This glossary describes some of the terms and abbreviations used in this manual. Where terms can have several meanings, the meaning presented here is intended. Abort A mechanism that indicates to a processor that the value associated with a memory access is invalid. An abort can be caused by the external or internal memory system as a result of attempting to access invalid instruction or data memory. An abort is classified as either a Prefetch or Data Abort, and an internal or External Abort.
Glossary Advanced High-performance Bus (AHB) The AMBA Advanced High-performance Bus system connects embedded processors such as an ARM core to high-performance peripherals, DMA controllers, on-chip memory, and interfaces. It is a high-speed, high-bandwidth bus that supports multi-master bus management to maximize system performance. See also Advanced Microcontroller Bus Architecture.
Glossary • AXI master the master and slave interface conventions for AXI components. Write address channel (AW) Write data channel (W) Write response channel (B) Read address channel (AR) Read data channel (R) AXI slave interface AXI master interface AXI terminology AXI interconnect Write address channel (AW) Write data channel (W) Write response channel (B) Read address channel (AR) Read data channel (R) AXI master interface AXI slave AXI slave interface The following AXI terms are general.
Glossary Read ID width The number of bits in the ARID bus. Read issuing capability The maximum number of active read transactions that a master interface can generate. Write ID capability The maximum number of different AWID values that a master interface can generate for all active write transactions at any one time. Write ID width The number of bits in the AWID and WID buses.
Glossary Base register write-back Updating the contents of the base register used in an instruction target address calculation so that the modified address is changed to the next higher or lower sequential address in memory. This means that it is not necessary to fetch the target address for successive instruction transfers and enables faster burst accesses to sequential memory. Beat Alternative word for an individual transfer within a burst. For example, an INCR4 burst comprises four beats.
Glossary Byte invariant In a byte-invariant system, the address of each byte of memory remains unchanged when switching between little-endian and big-endian operation. When a data item larger than a byte is loaded from or stored to memory, the bytes making up that data item are arranged into the correct order depending on the endianness of the memory access. The ARM architecture supports byte-invariant systems in ARMv6 and later versions.
Glossary Clean A cache line that has not been modified while it is in the cache is said to be clean. To clean a cache is to write dirty cache entries into main memory. If a cache line is clean, it is not written on a cache miss because the next level of memory contains the same data as the cache. See also Dirty. Clock gating Gating a clock signal for a macrocell with a control signal (such as PWRDOWN) and using the modified clock that results to control the operating state of the macrocell.
Glossary Cycles Per instruction (CPI) Cycles per instruction (or clocks per instruction) is a measure of the number of computer instructions that can be performed in one clock cycle. This figure of merit can be used to compare the performance of different CPUs that implement the same instruction set against each other. The lower the value, the better the performance. CoreSight The infrastructure for monitoring, tracing, and debugging a complete system on chip.
Glossary Enabled exception An exception is enabled when its exception enable bit in the FPCSR is set. When an enabled exception occurs, a trap to the user handler is taken. An operation that generates an exception condition might bounce to the support code to produce the result defined by the IEEE 754 standard. The exception is then reported to the user trap handler. Endianness Byte ordering. The scheme that determines the order in which successive bytes of a data word are stored in memory.
Glossary Illegal instruction An instruction that is architecturally Undefined. Implementation-defined Means that the behavior is not architecturally defined, but should be defined and documented by individual implementations. Implementation-specific Means that the behavior is not architecturally defined, and does not have to be documented by individual implementations. Used when there are a number of implementation options available and the option chosen does not affect software compatibility.
Glossary Load Store Unit (LSU) The part of a processor that handles load and store transfers. LSU See Load Store Unit. Macrocell A complex logic block with a defined interface and behavior. A typical VLSI system comprises several macrocells (such as a processor, an ETM, and a memory block) plus application-specific logic. Memory coherency A memory is coherent if the value read by a data read or instruction fetch is the value that was most recently written to that location.
Glossary Reserved A field in a control register or instruction format is reserved if the field is to be defined by the implementation, or produces Unpredictable results if the contents of the field are not zero. These fields are reserved for use in future extensions of the architecture or are implementation-specific. All reserved bits not used by the implementation must be written as 0 and are to be read as 0.
Glossary Stride The stride field, FPSCR[21:20], specifies the increment applied to register addresses in short vector operations. A stride of 00, specifying an increment of +1, causes a short vector operation to increment each vector register by +1 for each iteration, while a stride of 11 specifies an increment of +2. Subnormal value A value in the range (–2Emin < x < 2Emin), except for 0.
Glossary Unsupported values Specific data values that are not processed by the hardware but bounced to the support code for completion. These data can include infinities, NaNs, subnormal values, and zeros. An implementation is free to select which of these values is supported in hardware fully or partially, or requires assistance from support code to complete the operation.
Glossary Write-through (WT) In a write-through cache, data is written to main memory at the same time as the cache is updated. WT See Write-through. Cache terminology diagram The figure below illustrates the following cache terminology: • block address • cache line • cache set • cache way • index • tag.