Nios II Classic Processor Reference Guide Subscribe Send Feedback NII5V1 2015.04.02 101 Innovation Drive San Jose, CA 95134 www.altera.
TOC-2 Contents Introduction........................................................................................................ 1-1 Nios II Processor System Basics.................................................................................................................1-1 Getting Started with the Nios II Processor...............................................................................................1-2 Customizing Nios II Processor Designs.................................................
TOC-3 Supervisor Mode.............................................................................................................................. 3-1 User Mode.........................................................................................................................................3-2 Memory Management Unit........................................................................................................................3-2 Recommended Usage..............................................
TOC-4 No-Operation Instruction............................................................................................................ 3-65 Potential Unimplemented Instructions...................................................................................... 3-65 Document Revision History.....................................................................................................................3-66 Instantiating the Nios II Processor........................................................
TOC-5 Exception Handling.......................................................................................................................5-12 ECC.................................................................................................................................................. 5-13 JTAG Debug Module.....................................................................................................................5-15 Nios II/s Core.......................................................
TOC-6 ABI for Linux Systems...............................................................................................................................7-12 Linux Toolchain Relocation Information.................................................................................. 7-12 Linux Function Calls..................................................................................................................... 7-16 Linux Operating System Call Interface..............................................
TOC-7 cmpgtu ............................................................................................................................................8-27 cmpgtui ...........................................................................................................................................8-27 cmple................................................................................................................................................8-28 cmplei ........................................
TOC-8 rol .................................................................................................................................................... 8-68 roli ................................................................................................................................................... 8-68 ror ....................................................................................................................................................8-69 sll ............................
1 Introduction 2015.04.02 NII51001 Subscribe Send Feedback This handbook describes the Nios® II Classic processor from a high-level conceptual description to the low-level details of implementation. The chapters in this handbook describe the Nios II processor architecture, the programming model, and the instruction set. The Nios II Classic processor is only avaliable in the Quartus II 14.0 release and below. This handbook assumes you have a basic familiarity with embedded processor concepts.
1-2 NII51001 2015.04.
NII51001 2015.04.02 Customizing Nios II Processor Designs 1-3 Figure 1-1: Example of a Nios II Processor System Reset Clock JTAG connection to software debugger JTAG Debug Module SDRAM Memory SDRAM Controller Flash Memory On-Chip ROM UART Inst. System Interconnect Fabric Nios II Processor Core Data Tristate bridge to off-chip memory SRAM Memory TXD RXD Timer1 Timer2 LCD Display Driver LCD Screen General-Purpose I/O Buttons, LEDs, etc.
1-4 NII51001 2015.04.02 Configurable Soft Processor Core Concepts Because the pins and logic resources in Altera devices are programmable, many customizations are possible: • You can rearrange the pins on the chip to simplify the board design. For example, you can move address and data pins for external SDRAM memory to any side of the chip to shorten board traces. • You can use extra pins and logic resources on the chip for functions unrelated to the processor.
NII51001 2015.04.02 Custom Components 1-5 Custom Components You can also create custom components and integrate them in Nios II processor systems. For perform‐ ance-critical systems that spend most CPU cycles executing a specific section of code, it is a common technique to create a custom peripheral that implements the same function in hardware. This approach offers a double performance benefit: • Hardware implementation is faster than software.
1-6 NII51001 2015.04.02 Document Revision History • • • • Simulate the behavior of a Nios II processor within your system. Verify the functionality of your design, as well as evaluate its size and speed quickly and easily. Generate time-limited device programming files for designs that include Nios II processors. Program a device and verify your design in hardware.
NII51001 2015.04.02 Document Revision History Date Version October 2005 5.1.0 Maintenance release. May 2005 5.0.0 Maintenance release. September 2004 1.1 Maintenance release. May 2004 1.0 Initial release.
Processor Architecture 2 2015.04.02 NII51002 Subscribe Send Feedback This chapter describes the hardware structure of the Nios II processor, including a discussion of all the functional units of the Nios II architecture and the fundamentals of the Nios II processor hardware implementation. The Nios II architecture describes an instruction set architecture (ISA). The ISA in turn necessitates a set of functional units that implement the instructions.
2-2 NII51002 2015.04.02 Processor Implementation Figure 2-1: Nios II Processor Core Block Diagram Nios II Processor Core reset clock cpu_resetrequest cpu_resettaken JTAG interface to software debugger JTAG Debug Module irq[31..0] eic_port_data[44..
NII51002 2015.04.02 Register File 2-3 Implementation variables generally fit one of three trade-off patterns: more or less of a feature; inclusion or exclusion of a feature; hardware implementation or software emulation of a feature. An example of each trade-off follows: • More or less of a feature—For example, to fine-tune performance, you can increase or decrease the amount of instruction cache memory.
2-4 NII51002 2015.04.02 Arithmetic Logic Unit Related Information • • • • Programming Model on page 3-1 Programming Model Instruction Set Reference on page 8-1 Instruction Set Reference Arithmetic Logic Unit The Nios II ALU operates on data stored in general-purpose registers. ALU operations take one or two inputs from registers, and store a result back in a register. The ALU supports the data operations described in the table below.
NII51002 2015.04.02 Custom Instructions 2-5 Custom Instructions The Nios II architecture supports user-defined custom instructions. The Nios II ALU connects directly to custom instruction logic, enabling you to implement operations in hardware that are accessed and used exactly like native instructions. Refer to "Custom Instruction Tab" in the Instantiating the Nios II Processor chapter of the Nios II Processor Reference Handbook for additional information.
2-6 NII51002 2015.04.
NII51002 2015.04.02 Floating-Point Instructions Feature NaN Floating-Point Hardware Implementation with IEEE 7541985 Quiet Implemented Signaling Not implemented 2-7 Floating-Point Hardware 2 Implementation with IEEE 7542008 No distinction is made between signaling and quiet NaNs as input operands. A result that produces a NaN may produce either a signaling or quiet NaN.(1) Subnormal (denormalized) numbers Subnormal operands are treated as zero.
2-8 NII51002 2015.04.02 Floating Point Custom Instruction 2 Component Related Information Nios II Custom Instruction User Guide For more information about using floating-point custom instructions in software, refer to the Nios II Custom Instruction User Guide. Floating Point Custom Instruction 2 Component You can add floating-point custom instructions to any Nios II processor design. The floating-point division hardware requires more resources than the other instructions.
NII51002 2015.04.
2-10 NII51002 2015.04.02 Reset and Debug Signals In Qsys, the Floating Point Hardware component is under Embedded Processors on the Component Library tab. The Nios II floating-point custom instructions are based on the Altera floating-point megafunctions: ALTFP_MULT, ALTFP_ADD_SUB, and ALTFP_DIV. ® The Nios II software development tools recognize C code that takes advantage of the floating-point instructions present in the processor core.
NII51002 2015.04.02 Exception and Interrupt Controllers Signal Name reset_req Type Reset 2-11 Purpose This optional signal prevents the memory corruption by performing a reset handshake before the processor resets. For more information on adding reset signals to the Nios II processor, refer to “Advanced Features Tab” in the Instantiating the Nios II Processor chapter of the Nios II Processor Reference Handbook.
2-12 NII51002 2015.04.02 Internal Interrupt Controller An EIC can be software-configurable. Note: When the EIC interface and shadow register sets are implemented on the Nios II core, you must ensure that your software is built with the Nios II EDS version 9.0 or higher. Earlier versions have an implementation of the eret instruction that is incompatible with shadow register sets. For a typical example of an EIC, refer to the Vectored Interrupt Controller chapter in the Embedded Peripherals IP User Guide.
NII51002 2015.04.
2-14 NII51002 2015.04.
NII51002 2015.04.02 Memory and Peripheral Access 2-15 Related Information Avalon Interface Specifications Refer to the Avalon Interface Specifications for details of the Avalon-MM interface. Memory and Peripheral Access The Nios II architecture provides memory-mapped I/O access. Both data memory and peripherals are mapped into the address space of the data master port. The Nios II architecture uses little-endian byte ordering.
2-16 NII51002 2015.04.02 Shared Memory for Instructions and Data operations can complete in a single clock cycle when the data master port is connected to zero-wait-state memory. The Nios II architecture supports on-chip cache memory for improving average data transfer perform‐ ance when accessing slower memory. Refer to the "Cache Memory" section of this chapter for details. The Nios II architecture supports tightly-coupled memory, which provides guaranteed low-latency access to on-chip memory.
NII51002 2015.04.02 Cache Bypass Methods 2-17 Optimal cache configuration is application specific, although you can make decisions that are effective across a range of applications. For example, if a Nios II processor system includes only fast, on-chip memory (i.e., it never accesses slow, off-chip memory), an instruction or data cache is unlikely to offer any performance gain.
2-18 NII51002 2015.04.02 Accessing Tightly-Coupled Memory Accessing Tightly-Coupled Memory Tightly-coupled memories occupy normal address space, the same as other memory devices connected via system interconnect fabric. The address ranges for tightly-coupled memories (if any) are determined at system generation time. Software accesses tightly-coupled memory using regular load and store instructions.
NII51002 2015.04.
2-20 JTAG Debug Module NII51002 2015.04.02 Note: The Nios II MPU is optional and mutually exclusive from the Nios II MMU. Nios II systems can include either an MPU or MMU, but cannot include both an MPU and MMU on the same Nios II processor core.
NII51002 2015.04.02 Download and Execute Software 2-21 Note: While the processor has no minimum clock frequency requirements, Altera recommends that your design’s system clock frequency be at least four times the JTAG clock frequency to ensure that the on-chip instrumentation (OCI) core functions properly. Download and Execute Software Downloading software refers to the ability to download executable code and data to the processor’s memory via the JTAG connection.
2-22 NII51002 2015.04.02 Armed Triggers Table 2-6: Trigger Actions Action Description Break Halt execution and transfer control to the JTAG debug module. External trigger Assert a trigger signal output. This trigger output can be used, for example, to trigger an external logic analyzer. Trace on Turn on trace collection. Trace off Turn off trace collection. Trace sample Store one sample of the bus to trace buffer. Arm Enable an armed trigger.
NII51002 2015.04.02 Execution vs. Data Trace 2-23 Execution vs. Data Trace The JTAG debug module supports tracing the instruction bus (execution trace), the data bus (data trace), or both simultaneously. Execution trace records only the addresses of the instructions executed, enabling you to analyze where in memory (that is, in which functions) code executed. Data trace records the data associated with each load and store operation on the data bus.
2-24 NII51002 2015.04.02 Document Revision History Date Version December 2010 10.1.0 Added reference to tightly-coupled memory tutorial. July 2010 10.0.0 Maintenance release. November 2009 9.1.0 • Added external interrupt controller interface information. • Added shadow register set information. March 2009 9.0.0 Maintenance release. November 2008 8.1.0 • Expanded floating-point instructions information. • Updated description of optional cpu_resetrequest and cpu_ resettaken signals.
Programming Model 3 2015.04.02 NII51003 Subscribe Send Feedback This chapter describes the Nios II programming model, covering processor features at the assembly language level. Fully understanding the contents of this chapter requires prior knowledge of computer architecture, operating systems, virtual memory and memory management, software processes and process management, exception handling, and instruction sets.
3-2 User Mode NII51003 2015.04.02 tion’s access to memory and peripherals. In systems with an MPU, your system software controls the mode in which your application code runs. In Nios II systems without an MMU or MPU, all application and system code runs in supervisor mode. Code that needs direct access to and control of the processor runs in supervisor mode. For example, the processor enters supervisor mode whenever a processor exception (including processor reset or break) occurs.
NII51003 2015.04.02 Memory Management 3-3 an MMU-based Nios II processor. Do not include an MMU in your Nios II system unless your operating system requires it. Note: The Altera HAL and HAL-based real-time operating systems do not support the MMU. If your system needs memory protection, but not virtual memory management, refer to Memory Protection Unit section.
3-4 NII51003 2015.04.02 Memory Protection Whenever an instruction attempts to access a page that either has no TLB mapping, or lacks the appropriate permissions, the MMU generates an exception. The Nios II processor’s precise exceptions enable the system software to update the TLB, and then re-execute the instruction if desired. Memory Protection The Nios II MMU maintains read, write, and execute permissions for each page. The TLB provides the permission information when translating a VPN.
NII51003 2015.04.02 Physical Memory Address Space 3-5 Physical Memory Address Space The 4-GB physical memory is divided into low memory and high memory. The lowest ½ GB of physical address space is low memory. The upper 3½ GB of physical address space is high memory. Figure 3-1: Division of Physical Memory 0xFFFFFFFF 0x20000000 0x1FFFFFFF 0x00000000 3.5 GByte High Memory Accessed only via TLB 0.
3-6 NII51003 2015.04.02 TLB Organization Note: You can configure the number of TLB entries and the number of ways (set associativity) of the TLB with the Nios II Processor parameter editor in Qsys. By default, the TLB is a 16-way cache.
NII51003 2015.04.02 TLB Lookups 3-7 Related Information • • • • Instantiating the Nios II Processor on page 4-1 Instantiating the Nios II Processor Nios II Core Implementation Details on page 5-1 Nios II Core Implementation Details TLB Lookups A TLB lookup attempts to convert a virtual address (VADDR) to a physical address (PADDR).
3-8 NII51003 2015.04.02 Memory Regions handle the exception as appropriate. The precise exception effectively prevents the illegal access to memory. The MPU extends the Nios II processor to support user mode and supervisor mode. Typically, system software runs in supervisor mode and end-user applications run in user mode, although all software can run in supervisor mode if desired. System software defines which MPU regions belong to supervisor mode and which belong to user mode.
NII51003 2015.04.02 Access Permissions 3-9 The region limit uses a less-than instead of a less-than-or-equal-to comparison because less-than provides a more efficient implementation. The limit is one bit larger than the address so that full address range may be included in a range. Defining the region by limit results in slower and larger address range match logic than defining by size but allows finer granularity in region sizes.
3-10 NII51003 2015.04.02 General-Purpose Registers General-Purpose Registers The Nios II architecture provides thirty-two 32-bit general-purpose registers, r0 through r31. Some registers have names recognized by the assembler. For example, the zero register (r0) always returns the value zero, and writing to zero has no effect. The ra register (r31) holds the return address used by procedure calls and is implicitly accessed by the call, callr and ret instructions.
NII51003 2015.04.02 Control Registers 3-11 Control Registers Control registers report the status and change the behavior of the processor. Control registers are accessed differently than the general-purpose registers. The special instructions rdctl and wrctl provide the only means to read and write to the control registers and are only available in supervisor mode. Note: When writing to control registers, all undefined bits must be written as zero.
3-12 NII51003 2015.04.02 Control Registers Register 13 Name Register Contents Refer to The config Register on page 3-23 config Available only when the MPU or ECC is present. Otherwise reserved. 14 Refer to The mpubase Register mpubase Available only when the MPU is present. Otherwise reserved. 15 Refer to The mpuacc Register for MASK variations table. mpuacc Available only when the MPU is present. Otherwise reserved.
NII51003 2015.04.02 The status Register Register Name 8 3-13 Register Contents Refer to The pteaddr Register pteaddr Available only when the MMU is present. Otherwise reserved. 9 Refer to The tlbacc Register tlbacc Available only when the MMU is present. Otherwise reserved. 10 Refer to The tlbmisc Register tlbmisc Available only when the MMU is present. Otherwise reserved. 11 Refer to The eccinj Register eccinj Available only when ECC is present.
3-14 NII51003 2015.04.02 The status Register Table 3-9: status Control Register Field Descriptions Bit Description RSIE RSIE is the register set interrupt-enable bit. When set to Read/Write 1 EIC interface and shadow register sets only(8) NMI is the nonmaskable interrupt mode bit. The processor sets NMI to 1 when it takes a nonmaskable interrupt. Read EIC interface only(7) PRS is the previous register set field.
NII51003 2015.04.02 The estatus Register Bit Description IL Access EH(6) Available IL is the interrupt level field. The IL field controls what Read/Write 0 EIC interface only(7) IH is the interrupt handler mode bit. The processor sets IH to one when it takes an external interrupt. Read/Write 0 EIC interface only(7) EH is the exception handler mode bit. The processor sets EH to one when an exception occurs (including breaks). Software clears EH to zero when ready to handle exceptions again.
3-16 NII51003 2015.04.02 The bstatus Register All fields in the estatus register have read/write access. All fields reset to 0. When the Nios II processor takes an interrupt, if status.eh is zero (that is, the MMU is in nonexception mode), the processor copies the contents of the status register to estatus. Note: If shadow register sets are implemented, and the interrupt requests a shadow register set, the Nios II processor copies status to sstatus, not to estatus.
NII51003 2015.04.02 3-17 The ipending Register Related Information Exception Processing on page 3-36 The ipending Register The value of the ipending register indicates the value of the enabled interrupt signals driven into the processor. A value of one in bit n means that the corresponding irq n input is asserted and enabled in the ienable register. Writing a value to the ipending register has no effect. Note: The ipending register is present only when the internal interrupt controller is implemented.
3-18 NII51003 2015.04.02 The pteaddr Register Related Information • Instantiating the Nios II Processor on page 4-1 • Instantiating the Nios II Processor The pteaddr Register The pteaddr register contains the virtual address of the operating system’s page table and is only available in systems with an MMU. The pteaddr register layout accelerates fast TLB miss exception handling.
NII51003 2015.04.02 The tlbmisc Register 3-19 Issuing a wrctl instruction to the tlbacc register writes the tlbacc register with the specified value. If tlbmisc.WE = 1, the wrctl instruction also initiates a TLB write operation, which writes a TLB entry. The TLB entry written is specified by the line portion of pteaddr.VPN and the tlbmisc.WAY field. The value written is specified by the value written into tlbacc along with the values of pteaddr.VPN and tlbmisc.PID.
3-20 NII51003 2015.04.02 The RD Flag Bit Fields Reserved 15 14 13 12 EE 11 10 9 8 WAY 7 6 5 4 PID RD WE PID 3 2 1 0 DBL BAD PERM D Table 3-19: tlbmisc Control Register Field Descriptions Field EE WAY Description Access Reset Available If this field is a 1, a software-triggered ECC error (1, 2, or Read/ 3 bit error) occurred because software initiated a TLB Write read operation. Only set this field to 1 if CONFIG.ECCEN is 1.
NII51003 2015.04.02 The WE Flag 3-21 When system software changes the fields that specify the TLB entry, there is no immediate effect on pteaddr.VPN, tlbmisc.PID, or the tlbacc register. The registers retain their previous values until the next TLB read operation is initiated. For example, when the operating system sets pteaddr.VPN to a new value, the contents of tlbacc continues to reflect the previous TLB entry. tlbacc does not contain the new TLB entry until after an explicit TLB read.
3-22 NII51003 2015.04.02 The PERM Flag Refer to Nios II Exceptions (In Decreasing Priority Order) table in the "Exception Overview" section for more information on these exceptions. Related Information Exception Overview on page 3-36 The PERM Flag During a general exception, the processor sets PERM to one for a TLB permission violation exception, and clears PERM to zero otherwise. The D Flag The D flag indicates whether the exception is an instruction access exception or a data access exception.
NII51003 2015.04.02 The config Register 3-23 Table 3-21: badaddr Control Register Field Descriptions Field Description BADDR Access BADDR contains the byte instruction address or data Reset Read address associated with an exception when certain exceptions occur. The Address column of the Nios II Exceptions Table lists which exceptions write the BADDR field. 0 Available Only with extra exception information The BADDR field allows up to a 32-bit instruction address or data address.
3-24 NII51003 2015.04.02 The mpubase Register Field ECCEXE Description Access Reset ECCEX is the ECC error exception enable bit. When ECCEXE = 1, the Nios II processor generates ECC error Read/Write 0 Only with ECC ECCEN is the ECC enable bit. When ECCEN = 0, the Nios II processor ignores all ECC errors. When ECCEN = 1, the Read/Write 0 Only with ECC PE is the memory protection enable bit. When PE =1, the MPU is enabled. When PE = 0, the MPU is disabled.
NII51003 2015.04.02 The mpuacc Register 3-25 The INDEX and D fields specify the region information to access when an MPU region read or write operation is performed. The D field specifies whether the region is a data region or an instruction region. The INDEX field specifies which of the 32 data or instruction regions to access. If there are fewer than 32 instruction or 32 data regions, unused high-order bits must be written as zero and are read as zero.
3-26 NII51003 2015.04.02 The MASK Field Field PERM RD Description Access Reset Available PERM specifies the access permissions for the region. Read/ Write 0 Only with MPU RD is the read region flag. When RD = 1, wrctl instructions to the mpuacc register perform a read Write 0 Only with MPU WR is the write region flag. When WR = 1, wrctl instructions to the mpuacc register perform a write Write 0 Only with MPU operation. WR operation. The MASK and LIMIT fields are mutually exclusive.
NII51003 2015.04.
3-28 NII51003 2015.04.02 The MT Flag The MT Flag The MT flag determines the default memory type of an MPU data region. . The MT flag only applies to data regions. For instruction regions, the MT bit must be written with 0 for instruction regions and is always read as 0. When data cacheability is enabled on a data region, a data access to that region can be cached, if a data cache is present in the system.
NII51003 2015.04.02 3-29 The WR Flag The WR Flag Setting the WR flag signifies that an MPU region write operation should be performed when a wrctl instruction is issued to the mpuacc register. Refer to the MPU Region Read and Write Operations section for more information. The WR flag always returns 0 when read by a rdctl instruction. Note: Setting both the RD and WR flags to one results in undefined behavior.
3-30 NII51003 2015.04.02 The sstatus Register When shadow register sets are implemented, status.CRS indicates the register set currently in use. A Nios II core can have up to 63 shadow register sets. If n is the configured number of shadow register sets, the shadow register sets are numbered from 1 to n. Register set 0 is the normal register set. A shadow register set behaves precisely the same as the normal register set. The register set currently in use can only be determined by examining status.CRS.
NII51003 2015.04.02 Changing Register Sets Bit Description Access Reset Available RSIE is the register set interrupt- Read/Write Undefined (13) NMI is the nonmaskable interrupt mode bit.
3-32 NII51003 2015.04.02 Stacks and Shadow Register Sets • If the processor is currently running in the normal register set, insert the new register set number in estatus.CRS, and execute eret. • If the processor is currently running in a shadow register set, insert the new register set number in sstatus.CRS, and execute eret.
NII51003 2015.04.02 MPU Initialization 3-33 MPU region write operations set new values for the attributes of a region. Each MPU region write operation consists of the following actions: • Execute a wrctl instruction to the mpubase register with the mpubase.INDEX and mpubase.D fields set to identify the MPU region. • Execute a wrctl instruction to the mpuacc register with the mpuacc.WR field set to one and the mpuacc.RD field cleared to zero. The MPU region write operation sets the values for mpubase.
3-34 NII51003 2015.04.02 Working with ECC Working with ECC Enabling ECC The ECC is disabled on system reset. Before enabling the ECC, initialize the Nios II RAM blocks to avoid spurious ECC errors. The Nios II processor executes the INITI instruction on each cache line, which initializes the instruction cache RAM. The RAM does not require special initialization because any detected ECC errors are ignored if the line is invalid; the line is invalid after INITI instructions initialize the tag RAM.
NII51003 2015.04.02 Instruction Cache Tag RAM 3-35 Instruction Cache Tag RAM 1. Ensure all code up to the JMP instruction is in the same instruction cache line or is located in an ITCM. 2. Use a FLUSHI instruction to flush an instruction cache line other than the line containing the executing code. 3. Use a FLUSHP instruction to flush the pipeline. 4. Use a WRCTL instruction to set ECCINJ.ICTAG to INJS or INJD. This setting causes an ECC error to occur on the start of the next line fill. 5.
3-36 NII51003 2015.04.02 Exception Processing Exception Processing Exception processing is the act of responding to an exception, and then returning, if possible, to the preexception execution state. All Nios II exceptions are precise. Precise exceptions enable the system software to re-execute the instruc‐ tion, if desired, after handling the exception.
NII51003 2015.04.02 Exception Overview 3-37 The following table columns specify information for the exceptions: Exception—Gives the name of the exception. Type—Specifies the exception type. Available—Specifies when support for that exception is present. Cause—Specifies the value of the CAUSE field of the exception register, for exceptions that write the exception.CAUSE field. • Address—Specifies the instruction or data address associated with the exception.
3-38 NII51003 2015.04.
NII51003 2015.04.02 Exception Latency Exception MPU region violation (data) Type Instructionrelated Available MPU Cause 17 Address badaddr (data address) 3-39 Vector General exception Related Information • Requested Handler Address on page 3-42 • General-Purpose Registers on page 3-10 Exception Latency Exception latency specifies how quickly the system can respond to an exception.
3-40 NII51003 2015.04.02 Break Exceptions The reset state is undefined for all other system components, including but not limited to: • • • • • • • General-purpose registers, except for zero (r0) in the normal register set, which is permanently zero. Control registers, except for status. status.RSIE is reset to 1, and the remaining fields are reset to 0. Instruction and data memory. Cache memory, except for the instruction cache line associated with the reset vector. Peripherals.
NII51003 2015.04.02 Understanding Register Usage 3-41 Understanding Register Usage The bstatus control register and general-purpose registers bt (r25) and ba (r30) in the normal register set are reserved for debugging. Code is not prevented from writing to these registers, but debug code might overwrite the values. The break handler can use bt (r25) to help save additional registers.
3-42 Requested Handler Address NII51003 2015.04.02 • Requested Register Set on page 3-42 • Requested Interrupt Level on page 3-42 Requested Handler Address The RHA specifies the address of the handler associated with the interrupt. The availability of an RHA for each interrupt allows the Nios II processor to jump directly to the interrupt handler, reducing interrupt latency. The RHA for each interrupt is typically software-configurable.
NII51003 2015.04.02 Internal Interrupt Controller 3-43 For the best interrupt performance, assign a dedicated register set to each of the most time-critical interrupts. Less-critical interrupts can share register sets, provided the ISRs are protected from register corruption as noted in the Requested Register Set section of this chapter. The method for mapping interrupts to register sets is specific to the particular EIC implementation.
3-44 NII51003 2015.04.02 Instruction-Related Exceptions Figure 3-2: Relationship Between ienable, ipending, PIE and Hardware Interrupts 31 0 ienable Register irq0 irq1 irq2 irq31 ... IENABLE0 IENABLE1 IENABLE2 IENABLE31 External hardware interrupt request inputs irq[31..0] 31 0 ipending Register IPENDING0 IPENDING1 IPENDING2 IPENDING31 ... ...
NII51003 2015.04.02 Trap Instruction • • • • 3-45 Fast TLB miss Double TLB miss TLB permission violation MPU region violation Note: All noninterrupt exception handlers must run in the normal register set. Related Information Exception Processing Flow on page 3-49 Trap Instruction When a program issues the trap instruction, the processor generates a software trap exception. A program typically issues a software trap when the program requires servicing by the operating system.
3-46 Supervisor-Only Instruction NII51003 2015.04.02 Note: All undefined opcodes are reserved. The processor does occasionally use some undefined encodings internally. Executing one of these undefined opcodes does not trigger an illegal instruc‐ tion exception. Refer to the Nios II Core Implementation Details chapter of the Nios II Processor Reference Handbook for information about each specific Nios II core.
NII51003 2015.04.02 Misaligned Destination Address 3-47 A data address is considered misaligned if the byte address is not a multiple of the width of the load or store instruction data width (four bytes for word, two bytes for half-word). Byte load and store instruc‐ tions are always aligned so never take a misaligned address exception.
3-48 NII51003 2015.04.02 Double TLB Miss There are two kinds of fast TLB miss exceptions: • Fast TLB miss (instruction)—Any instruction fetch can cause this exception. • Fast TLB miss (data)—Load, store, initda, and flushda instructions can cause this exception. The fast TLB miss exception handler can inspect the tlbmisc.D field to determine which kind of fast TLB miss exception occurred. Double TLB Miss Double TLB miss exceptions are implemented only in Nios II processors that include the MMU.
NII51003 2015.04.02 Other Exceptions 3-49 There are two kinds of MPU region violation exceptions: • MPU region violation (instruction)—Any instruction fetch can cause this exception. • MPU region violation (data)—Load, store, initda, and flushda instructions can cause this exception. The general exception handler can inspect the exception.CAUSE field to determine which kind of MPU region violation exception occurred.
3-50 NII51003 2015.04.02 Exception Flow with the EIC Interface • RHA—The requested handler address for the interrupt handler assigned to the requested interrupt. • RRS—The requested register set to be used when the interrupt handler executes. If shadow register sets are not implemented, RRS must always be 0. • RIL—The requested interrupt level specifies the priority of the interrupt. • RNMI—The requested NMI flag specifies whether to treat the interrupt as nonmaskable.
NII51003 2015.04.02 3-51 Exception Flow with the Internal Interrupt Controller Exception Flow with the Internal Interrupt Controller A general exception handler determines which of the pending interrupts has the highest priority, and then transfers control to the appropriate ISR. The ISR stops the interrupt from being visible (either by clearing it at the source or masking it using ienable) before returning and/or before re-enabling PIE. The ISR also saves estatus and ea (r29) before re-enabling PIE.
3-52 NII51003 2015.04.02 Exceptions and Processor Status System Status Before Taking Exception External Interrupt Asserted (18) Processor Status Register status.EH==1 (19) or Field Internal Interrupt Asserted or Noninterrupt Exception status.EH==0 status.EH==0 No TLB Miss RRS==0 (2 0) RRS!=0 RRS==0 RRS!=0 status.
NII51003 2015.04.02 Determining the Cause of Interrupt and Instruction-Related Exceptions 3-53 Determining the Cause of Interrupt and Instruction-Related Exceptions The general exception handler must determine the cause of each exception and then transfer control to an appropriate exception routine.
3-54 NII51003 2015.04.
NII51003 2015.04.02 Nested Exceptions with an External Interrupt Controller 3-55 Nested Exceptions with an External Interrupt Controller With an EIC, handling of nested interrupts is more sophisticated than with the internal interrupt controller. Handling of noninterrupt exceptions, however, is the same. When individual external interrupts have dedicated shadow register sets, the Nios II processor supports fast interrupt handling with no overhead for saving register contents.
3-56 NII51003 2015.04.02 Handling Nonmaskable Interrupts Multiple interrupts can share a register set, with some loss of performance. There are two techniques for sharing register sets: • Set status.RSIE to 0. When an ISR is running in a given register set, the processor does not take any maskable interrupt assigned to the same register set. Such interrupts must wait for the running ISR to complete, regardless of their interrupt level. Note: This technique can result in a priority inversion.
NII51003 2015.04.02 Masking Interrupts with the Internal Interrupt Controller 3-57 The status.IL field controls what level of external maskable interrupts can be serviced. The processor services a maskable interrupt only if its requested interrupt level is greater than status.IL. An ISR can make run-time adjustments to interrupt nesting by manipulating status.IL. For example, if an ISR is running at level 5, to temporarily allow pre-emption by another level 5 interrupt, it can set status.IL to 4.
3-58 Return Address Considerations NII51003 2015.04.02 Note: When the EIC interface and shadow register sets are implemented on the Nios II core, you must ensure that your software, including ISRs, is built with the version of the GCC compiler included in Nios II EDS version 9.0 or later. Earlier versions have an implementation of the eret instruction that is incompatible with shadow register sets.
NII51003 2015.04.02 Virtual Address Aliasing 3-59 The Nios II architecture provides the following mechanisms to bypass the cache: • When no MMU is present, bit 31 of the address is reserved for bit-31 cache bypass. With bit-31 cache bypass, the address space of processor cores is 2 GB, and the high bit of the address controls the caching of data memory accesses. • When the MMU is present, cacheability is controlled by the MMU, and bit 31 functions as a normal address bit.
3-60 NII51003 2015.04.02 Instruction Set Categories For example, in a 64-KB direct-mapped cache with a 16-byte line, bits 15:4 are used to select the line. Assume that virtual address 0x1000 is mapped to physical address 0xF000 and virtual address 0x2000 is also mapped to physical address 0xF000. This is an illegal virtual address alias because accesses to virtual address 0x1000 use line 0x1 and accesses to virtual address 0x2000 use line 0x2 even though they map to the same physical address.
NII51003 2015.04.02 Arithmetic and Logical Instructions Instruction ldbio ldbuio stbio ldhio ldhuio sthio 3-61 Description These operations load/store byte and half-word data from/to peripherals without caching or buffering. Arithmetic and Logical Instructions Logical instructions support and, or, xor, and nor operations. Arithmetic instructions support addition, subtraction, multiplication, and division operations.
3-62 NII51003 2015.04.02 Comparison Instructions Table 3-44: Move Instructions Instruction mov movhi movi movui movia Description mov copies the value of one register to another register. movi moves a 16-bit signed immediate value to a register, and sign-extends the value to 32 bits. movui and movhi move a 16-bit immediate value into the lower or upper 16-bits of a register, inserting zeros in the remaining bit positions. Use movia to load a register with an address.
NII51003 2015.04.02 Program Control Instructions 3-63 Table 3-46: Shift and Rotate Instructions Instruction rol ror roli sll slli sra srl srai srli Description The rol and roli instructions provide left bit-rotation. roli uses an immediate value to specify the number of bits to rotate. The ror instructions provides right bit-rotation. There is no immediate version of ror, because roli can be used to implement the equivalent operation.
3-64 NII51003 2015.04.02 Other Control Instructions Table 3-48: Conditional Branch Instructions Instruction bge bgeu bgt bgtu ble bleu blt bltu beq bne Description These instructions provide relative branches that compare two register values and branch if the expression is true. Refer to the "Comparison Instructions" section of this chapter for a description of the relational operations implemented.
NII51003 2015.04.02 Custom Instructions Instruction 3-65 Description These instructions read and write a general-purpose registers between the current register set and another register set. rdprs wrprs wrprs can set r0 to 0 in a shadow register set. System software must use wrprs to initialize r0 to 0 in each shadow register set before using that register set. Custom Instructions The custom instruction provides low-level access to custom instruction logic.
3-66 NII51003 2015.04.02 Document Revision History Related Information Unimplemented Instruction on page 3-45 Document Revision History Table 3-50: Document Revision History Date April 2015 Version 2015.04.02 Changes • Removed obsolete devices: Cyclone II and Stratix II • Removed config.ANI flag from chapter February 2014 13.1.0 • Added information on ECC support. • Removed HardCopy information. • Removed references to SOPC Builder. May 2011 11.0.
NII51003 2015.04.02 Document Revision History Date Version Changes September 2004 1.1 • Added details for new control register ctl5. • Updated details of debug and break processing to reflect new behavior of the break instruction. May 2004 1.0 Initial release.
4 Instantiating the Nios II Processor 2015.04.02 NII51004 Subscribe Send Feedback This chapter describes the Nios II Processor parameter editor in Qsys. The Nios II Processor parameter editor allows you to specify the processor features for a particular Nios II hardware system. This chapter covers the features of the Nios II processor that you can configure with the Nios II Processor parameter editor; it is not a user guide for creating complete Nios II processor systems.
4-2 NII51004 2015.04.02 Core Selection Name Description Exception Vector Exception vector memory Exception vector offset Refer to the "General Exception Vectors" section. Exception vector MMU and MPU Include MMU Fast TLB Miss Exception vector memory Refer to the "Memory Management Unit Settings" section. Fast TLB Miss Exception vector offset Fast TLB Miss Exception vector Include MPU Refer to the "Memory Protection Unit Settings" section.
NII51004 2015.04.02 Multiply and Divide Settings 4-3 Multiply and Divide Settings The Nios II/s and Nios II/f cores offer hardware multiply and divide options. You can choose the best option to balance embedded multiplier usage, logic element (LE) usage, and performance. The Hardware multiplication type parameter for each core provides the following list: • DSP Block—Include DSP block multipliers in the arithmetic logic unit (ALU).
4-4 NII51004 2015.04.02 General Exception Vector General Exception Vector Parameters in this section select the memory module where the general exception vector (exception address) resides, and the location of the general exception vector. The general exception vector cannot be configured until your system memory components are in place. The Exception vector memory list, which includes all memory modules mastered by the Nios II processor, selects the exception vector memory module.
NII51004 2015.04.02 Memory Protection Unit Settings 4-5 Note: The Nios II MMU is optional and mutually exclusive from the Nios II MPU. Nios II systems can include either an MMU or MPU, but cannot include both an MMU and MPU in the same design. For information about the Nios II MMU, refer to the Programming Model chapter of the Nios II Processor Reference Handbook.
4-6 NII51004 2015.04.02 Instruction Master Settings Name Description Omit data master port Data cache Data cache line size Burst transfers Refer to the "Data Master" Settings. Data cache victim buffer implementa‐ tion Number of tightly coupled instruction master port(s) The following sections describe the configuration settings available.
NII51004 2015.04.02 Data Master Settings 4-7 Data Master Settings The Data Master parameters provide the following options for the Nios II/f core: • Omit data master port—Removes the Avalon-MM data master port from the Nios II processor. The port is only successfully removed when Data cache is set to None and Number of tightly coupled data master port(s) is greater than zero.
4-8 NII51004 2015.04.02 Reset Signals Name Description Illegal instruction Division error Misaligned memory access Refer to the "Exception Checking" section. Extra exception information HardCopy Compatibility HardCopy compatible Refer to the "Hardcopy Compatible" section. ECC ECC present Refer to the "ECC" section.
NII51004 2015.04.02 Exception Checking 4-9 Related Information SOPC Builder to Qsys Migration Guidelines For information about upgrading IDs that were manually-assigned values in Qsys, refer to the SOPC Builder to Qsys Migration Guideline.
4-10 NII51004 2015.04.02 Interrupt Controller Interfaces Related Information • Programming Model on page 3-1 • Programming Model Interrupt Controller Interfaces The Interrupt controller setting determines which of the following configurations is implemented: • Internal interrupt controller • External interrupt controller (EIC) interface The EIC interface is available only on the Nios II/f core.
NII51004 2015.04.02 ECC 4-11 Related Information Altera ASICs ECC ECC is only available for the Nios II/f core and provides ECC support for Nios II internal RAM blocks, such as instruction cache, MMU TLB, and register file. The SECDED ECC algorithm is based on Hamming codes, which detect 1 or 2 bit errors and corrects 1 bit errors. If the Nios II processor does not attempt to correct any errors and only detects them, the ECC algorithm can detect 3 bit errors.
4-12 NII51004 2015.04.02 MMU MMU When Include MMU on the Core Nios II tab is on, the MMU settings on the MMU and MPU Settings tab provide the following options for the MMU in the Nios II/f core. Typically, you should not need to change any of these settings from their default values. • Process ID (PID) bits—Specifies the number of bits to use to represent the process identifier.
NII51004 2015.04.02 JTAG Debug Module Tab 4-13 Related Information • • • • Programming Model on page 3-1 Programming Model Nios II Core Implementation Details on page 5-1 Nios II Core Implementation Details JTAG Debug Module Tab The JTAG Debug Module tab presents settings for configuring the JTAG debug module on the Nios II processor. You can select the debug features appropriate for your target application.
4-14 NII51004 2015.04.02 Debug Level Settings Feature Description Hardware Breakpoints Sets a breakpoint on instructions residing in nonvolatile memory, such as flash memory. Data Triggers Triggers based on address value, data value, or read or write cycle. You can use a trigger to halt the processor on specific events or conditions, or to activate other events, such as starting execution trace, or sending a trigger signal to an external logic analyzer.
NII51004 2015.04.
4-16 NII51004 2015.04.02 Advanced Debug Settings Related Information General Exception Vector on page 4-4 Advanced Debug Settings Debug levels 3 and 4 support trace data collection into an on-chip memory buffer. You can set the onchip trace buffer size to sizes from 128 to 64K trace frames, using OCI Onchip Trace. Larger buffer sizes consume more on-chip M4K RAM blocks. Every M4K RAM block can store up to 128 trace frames. Note: The Nios II MMU does not support the JTAG debug module trace.
NII51004 2015.04.02 Floating Point Hardware 2 Custom Instruction 4-17 For information about converting SOPC Builder designs to Qsys, refer to the SOPC Builder to Qsys Migration Guidelines. Related Information SOPC Builder to Qsys Migration Guidelines Floating Point Hardware 2 Custom Instruction The Nios II processor offers a set of optional predefined custom instructions that implement floatingpoint arithmetic operations.
4-18 NII51004 2015.04.02 Bitswap Custom Instruction To add the floating-point custom instructions to the Nios II processor in Qsys, select Floating Point Hardware under Custom Instruction Modules on the Component Library tab, and click Add. By default, Qsys includes floating-point addition, subtraction, and multiplication, but omit the more resource intensive floating-point division.
NII51004 2015.04.02 Document Revision History 4-19 Document Revision History Table 4-9: Document Revision History Date April 2015 Version 2015.04.02 Changes Maintenance release. February 2014 13.1.0 • Added information about the Floating Point Custom Instruction 2 Component • Added information about ECC support. • Removed references to SOPC Builder. May 2011 11.0.0 • Revised the entire chapter for the new Qsys system integration tool. • Replaced GUI screen shots with parameter tables.
4-20 NII51004 2015.04.02 Document Revision History Date Version October 2005 5.1.0 Maintenance release. May 2005 5.0.0 • Updates to reflect new GUI options in Nios II processor version 5.0. • New details in “Caches and Tightly-Coupled Memory” section. September 2004 1.1 • Updates to reflect new GUI options in Nios II processor version 1.1. • New details in section “Multiply and Divide Settings.” May 2004 1.0 Initial release.
5 Nios II Core Implementation Details 2015.04.02 NII51015 Send Feedback Subscribe This document describes all of the Nios II processor core implementations available at the time of publishing. This document describes only implementation-specific features of each processor core. All cores support the Nios II instruction set architecture. ® For more information regarding the Nios II instruction set architecture, refer to the Instruction Set Reference chapter of the Nios II Processor Reference Handbook.
5-2 NII51015 2015.04.
NII51015 2015.04.02 Device Family Support 5-3 Related Information • Instruction Set Reference on page 8-1 • Instruction Set Reference Device Family Support All Nios II cores provide the same support for target Altera device families.
5-4 Overview NII51015 2015.04.02 The Nios II/f fast core is designed for high execution performance. Performance is gained at the expense of core size. The base Nios II/f core, without the memory management unit (MMU) or memory protection unit (MPU), is approximately 25% larger than the Nios II/s core.
NII51015 2015.04.02 Multiply and Divide Performance 5-5 The Nios II/f core also provides a hardware divide option that includes LE-based divide circuitry in the ALU. Including an ALU option improves the performance of one or more arithmetic instructions. Note: The performance of the embedded multipliers differ, depending on the target FPGA family.
5-6 NII51015 2015.04.02 Shift and Rotate Performance Shift and Rotate Performance The performance of shift operations depends on the hardware multiply option. When a hardware multiplier is present, the ALU achieves shift and rotate operations in three or four clock cycles. Otherwise, the ALU includes dedicated shift circuitry that achieves one-bit-per-cycle shift and rotate performance.
NII51015 2015.04.
5-8 NII51015 2015.04.02 Data Cache Data Cache The data cache memory has the following characteristics: • Direct-mapped cache implementation • Configurable line size of 4, 16, or 32 bytes • The data master port reads an entire cache line at a time from memory, and issues one read per clock cycle. • Write-back • Write-allocate (i.e.
NII51015 2015.04.02 Bursting 5-9 Related Information • • • • Instruction Set Reference on page 8-1 Instruction Set Reference Processor Architecture on page 2-1 Processor Architecture Bursting When the data cache is enabled, you can enable bursting on the data master port. Consult the documentation for memory devices connected to the data master port to determine whether bursting can improve performance.
5-10 NII51015 2015.04.02 Memory Protection Unit The μTLBs are not visible to software. They act as an inclusive cache of the main TLB. The processor firsts look for a hit in the μTLB. If it misses, it then looks for a hit in the main TLB. If the main TLB misses, the processor takes an exception. If the main TLB hits, the TLB entry is copied into the μTLB for future accesses.
NII51015 2015.04.02 Branch Prediction 5-11 The A-stage stall occurs if any of the following conditions occurs: • An A-stage memory instruction is waiting for Avalon-MM data master requests to complete. Typically this happens when a load or store misses in the data cache, or a flushd instruction needs to write back a dirty line. • An A-stage shift/rotate instruction is still performing its operation. This only occurs with the multicycle shift circuitry (i.e.
5-12 NII51015 2015.04.
NII51015 2015.04.02 External Interrupt Controller Interface • • • • • 5-13 Division error Fast translation lookaside buffer (TLB) miss (MMU only) Double TLB miss (MMU only) TLB permission violation (MMU only) MPU region violation (MPU only) External Interrupt Controller Interface The EIC interface enables you to speed up interrupt handling in a complex system by adding a custom interrupt controller.
5-14 NII51015 2015.04.02 ECC • Instruction cache • ECC errors (1, 2, or 3 bits) that occur in the instruction cache are recoverable; the Nios II processor flushes the cache line and reads from external memory instead of correcting the ECC error.
NII51015 2015.04.
5-16 NII51015 2015.04.
NII51015 2015.04.02 Shift and Rotate Performance ALU Option Hardware Details Cycles per instruc‐ tion 5-17 Supported Instructions Embedded multiplier on Cyclone III families ALU includes 32 x 16-bit multiplier 5 mul, muli Hardware divide ALU includes multicycle divide circuit 4 – 66 div, divu Shift and Rotate Performance The performance of shift operations depends on the hardware multiply option.
5-18 NII51015 2015.04.02 Tightly-Coupled Memory • Direct-mapped cache implementation • The instruction master port reads an entire cache line at a time from memory, and issues one read per clock cycle.
NII51015 2015.04.02 Pipeline Stalls Stage Letter Stage Name M Memory W Writeback 5-19 Up to one instruction is dispatched and/or retired per cycle. Instructions are dispatched and retired inorder. Static branch prediction is implemented using the branch offset direction; a negative offset (backward branch) is predicted as taken, and a positive offset (forward branch) is predicted as not taken. The pipeline stalls for the following conditions: • • • • Multicycle instructions (e.g.
5-20 NII51015 2015.04.
NII51015 2015.04.02 Overview 5-21 at the expense of execution performance. The Nios II/e core is roughly half the size of the Nios II/s core, but the execution performance is substantially lower. The resulting core is optimal for cost-sensitive applications as well as applications that require simple control logic.
5-22 NII51015 2015.04.02 Instruction Performance Instruction Performance The Nios II/e core dispatches a single instruction at a time, and the processor waits for an instruction to complete before fetching and dispatching the next instruction. Because each instruction completes before the next instruction is dispatched, branch prediction is not necessary. This greatly simplifies the consider‐ ation of processor stalls. Maximum performance is one instruction per six clock cycles.
NII51015 2015.04.02 Document Revision History 5-23 Document Revision History Table 5-17: Document Revision History Date April 2015 Version 2015.04.02 Changes Obsolete devices removed (Stratix II, Cyclone II). February 2014 13.1.0 • Added information on ECC support • Removed HardCopy support information • Removed references to SOPC Builder May 2011 11.0.0 Maintenance release. December 2010 10.1.0 Maintenance release. July 2010 10.0.
6 Nios II Processor Revision History 2015.04.02 NII51018 Send Feedback Subscribe Each release of the Nios II Embedded Design Suite (EDS) introduces improvements to the Nios II processor, the software development tools, or both. This chapter catalogs the history of revisions to the Nios II processor; it does not track revisions to development tools, such as the Nios II Software Build Tools (SBT).
6-2 NII51018 2015.04.02 Nios II Versions Version Release Date Notes 9.1 November 2009 • Added optional external interrupt controller interface. • Added optional shadow register sets. 9.0 March 2009 No changes. 8.1 November 2008 No changes. 8.0 May 2008 • • • • 7.2 Added an optional memory management unit (MMU). Added an optional memory protection unit (MPU). Added advanced exception checking. Added the initda instruction. October 2007 Added the jmpi instruction. 7.
NII51018 2015.04.02 Architecture Revisions 6-3 Architecture Revisions Architecture revisions augment the fundamental capabilities of the Nios II architecture, and affect all Nios II cores. A change in the architecture mandates a revision to all Nios II cores to accommodate the new architectural enhancement. For example, when Altera adds a new instruction to the instruction set, Altera consequently must update all Nios II cores to recognize the new instruction.
6-4 NII51018 2015.04.02 Core Revisions Version Release Date Notes 1.01 September 2004 No changes. 1.0 May 2004 Initial release of the Nios II processor architecture. Core Revisions Core revisions introduce changes to an existing Nios II core. Core revisions most commonly fix identified bugs, or add support for an architecture revision. Not every Nios II core is revised with every release of the Nios II architecture.
NII51018 2015.04.02 Nios II/f Core Version 5.1 SP1 Release Date 6-5 Notes January 2006 Bug Fix: Back-to-back store instructions can cause memory corruption to the stored data. If the first store is not to the last word of a cache line and the second store is to the last word of the line, memory corruption occurs. 5.1 October 2005 No changes. 5.0 May 2005 • Added optional tightly-coupled memory ports.
6-6 NII51018 2015.04.02 Nios II/s Core Nios II/s Core Table 6-4: Nios II/s Core Revisions Version Release Date Notes 13.1 November 2013 • Added support for enhanced floating-point custom instructions 11.0 May 2011 No changes. 10.1 December 2010 No changes. 10.0 July 2010 No changes. 9.1 November 2009 No changes. 9.0 March 2009 No changes. 8.1 November 2008 No changes. 8.0 May 2008 Implemented the illegal instruction exception. 7.2 October 2007 Implemented the jmpi instruction.
NII51018 2015.04.02 Nios II/e Core Version Release Date 1.1 December 2004 6-7 Notes • Added user-configurable options affecting multiply and shift operations. Now designers can choose one of three options: (1) Use embedded multiplier resources available in the target device family (previously available). (2) Use logic elements to implement multiply and shift hardware (new option). (3) Omit multiply hardware.
6-8 NII51018 2015.04.02 JTAG Debug Module Revisions Version Release Date Notes 6.1 November 2006 No changes. 6.0 May 2006 No changes. 5.1 October 2005 No changes. 5.0 May 2005 Support for HardCopy devices (previous versions required a workaround to support HardCopy devices). 1.1 December 2004 Added cpuid control register. 1.01 September 2004 Bug fix: 1.
NII51018 2015.04.02 Document Revision History Version Release Date Notes 5.0 May 2005 Support for HardCopy devices (previous versions of the JTAG debug module did not support HardCopy devices). 1.1 December 2004 Bug fix: September 2004 • Feature enhancements: 1.01 6-9 When using the Nios II/s and Nios II/f cores, hardware breakpoints may have falsely triggered when placed on the instruction sequentially following a jmp, trap, or any branch instruction.
6-10 NII51018 2015.04.02 Document Revision History Date Version Changes March 2009 9.0.0 Maintenance release. November 2008 8.1.0 Maintenance release. May 2008 8.0.0 • • • • October 2007 7.2.0 • Added jmpi instruction information. • Added exception handling information. May 2007 7.1.0 • Updated tables to reflect no changes to cores. • Added table of contents to Introduction section. • Added Referenced Documents section. March 2007 7.0.0 Updated tables to reflect no changes to cores.
Application Binary Interface 7 2015.04.02 NII51016 Subscribe Send Feedback This chapter describes the Application Binary Interface (ABI) for the Nios II processor.
7-2 NII51016 2015.04.02 Memory Alignment Memory Alignment Contents in memory are aligned as follows: • A function must be aligned to a minimum of 32-bit boundary. • The minimum alignment of a data element is its natural size. A data element larger than 32 bits need only be aligned to a 32-bit boundary. • Structures, unions, and strings must be aligned to a minimum of 32 bits. • Bit fields inside structures are always 32-bit aligned.
NII51016 2015.04.
7-4 NII51016 2015.04.02 Stacks Stacks The stack grows downward (i.e. towards lower addresses). The stack pointer points to the last used slot. The frame pointer points to the saved frame pointer near the top of the stack frame. The figure below shows an example of the structure of a current frame. In this case, function a() calls function b(), and the stack is shown before the call and after the prologue in the called function has completed.
NII51016 2015.04.02 Further Examples of Stacks 7-5 Further Examples of Stacks There are a number of special cases for stack layout, which are described in this section. Stack Frame for a Function With alloca() The Nios II stack frame implementation provides support for the alloca() function, defined in the Berkeley Software Distribution (BSD) extension to C, and implemented by the gcc compiler.
7-6 NII51016 2015.04.02 Stack Frame for a Function with Structures Passed By Value Figure 7-3: Stack Frame Using Variable Arguments In functiona() Just prior to calling b() Higher addresses Stack pointer Outgoing stack arguments In functionb() Just after executing prologue Incoming stack arguments Allocated and freed by a() (i.e.
NII51016 2015.04.02 Prologue Variations 7-7 Note: An even better way to find out what the prologue has done is to use information stored in the DWARF-2 debugging fields of the executable and linkable format (.elf) file.
7-8 NII51016 2015.04.02 Return Values The equivalent structure representing the arguments is: struct { int a; int b; }; The first 16 bytes of the struct are assigned to r4 through r7. Therefore r4 is assigned the value of a and r5 the value of b. The first 16 bytes to a function taking variable arguments are passed the same way as a function not taking variable arguments. The called function must clean up the stack as necessary to support the variable arguments.
NII51016 2015.04.02 7-9 DWARF-2 Definition b(&value, i, j); } DWARF-2 Definition Registers r0 through r31 are assigned numbers 0 through 31 in all DWARF-2 debugging sections. Object Files Table 7-3: Nios II-Specific ELF Header Values Member Value e_ident[EI_CLASS] ELFCLASS32 e_ident[EI_DATA] ELFDATA2LSB e_machine EM_ALTERA_NIOS2 == 113 Relocation In a Nios II object file, each relocatable address reference possesses a relocation type.
7-10 NII51016 2015.04.
NII51016 2015.04.
7-12 ABI for Linux Systems NII51016 2015.04.
NII51016 2015.04.02 Linux Toolchain Relocation Information 7-13 R_NIOS2_GLOB_DAT R_NIOS2_JUMP_SLOT R_NIOS2_RELATIVE A global offset table (GOT) entry referenced using R_NIOS2_GOT16, R_NIOS2_GOT_LO and/or R_NIOS2_GOT_HA must be resolved at load time. A GOT entry referenced only using R_NIOS2_CALL16, R_NIOS2_CALL_LO and/or R_NIOS2_CALL_HA can initially refer to a procedure linkage table (PLT) entry and then be resolved lazily.
7-14 NII51016 2015.04.02 Copy Relocation Copy Relocation The R_NIOS2_COPY relocation is used to mark variables allocated in the executable that are defined in a shared library. The variable’s initial value is copied from the shared library to the relocated location. Jump Slot Relocation Jump slot relocations are used for the PLT. For information about the PLT, refer to "Procedure Linkage Table" section.
NII51016 2015.04.02 Thread-Local Storage ldw r6, %tls_ldo(x2)(r2) # Value of x2 in r6 7-15 # R_NIOS2_TLS_LDO16 x2 One 2-word GOT slot is allocated for all R_NIOS2_TLS_LDM16 operations in the linked object. Any thread-local symbol in this object can be used, as shown in "GOT Slot with Thread-Local Storage" example.
7-16 NII51016 2015.04.02 Linux Function Calls Linux Function Calls Register r23 is reserved for the thread pointer on GNU Linux systems. It is initialized by the C library and it may be used directly for TLS access, but not modified. On non-Linux systems r23 is a general-purpose, callee-saved register. The global pointer, r26 or gp, is globally fixed. It is initialized in startup code and always valid on entry to a function.
NII51016 2015.04.02 Linux Process Initialization 7-17 There are no floating-point exceptions. The optional floating point unit (FPU) does not support exceptions and any process wanting exact IEEE conformance needs to use a soft-float library (possibly accelerated by use of the attached FPU). The break instruction in a user process might generate a SIGTRAP signal for that process, but is not required to. Userspace programs should not use the break instruction and userspace debuggers should not insert one.
7-18 NII51016 2015.04.02 Linux Position-Independent Code The GOT pointer is loaded using a PC-relative offset to the _gp_got symbol, as shown below. Example 7-12: Loading the GOT Pointer nextpc r22 1: orhi r1, %hiadj(_gp_got - 1b) # R_NIOS2_PCREL_HA _gp_got addi r1, r1, %lo(_gp_got - 1b) # R_NIOS2_PCREL_LO _gp_got - 4 add r22, r22, r1 # GOT pointer in r22 Data may be accessed by loading its location from the GOT. A single word GOT entry is generated for each referenced symbol.
NII51016 2015.04.02 Linux Position-Independent Code 7-19 The call and jmpi instructions are not available in position-independent code. Instead, all calls are made through the GOT. Function addresses may be loaded with %call, which allows lazy binding. To initialize a function pointer, load the address of the function with %got instead. If no input object requires the address of the function its GOT entry is placed in the PLT GOT for lazy binding, as shown in the example below.
7-20 NII51016 2015.04.02 Linux Program Loading and Dynamic Linking Ltable: .word %gotoff(Label1) .word %gotoff(Label2) .word %gotoff(Label3) Related Information Procedure Linkage Table on page 7-20 Linux Program Loading and Dynamic Linking Global Offset Table Because shared libraries are position-independent, they can not contain absolute addresses for symbols. Instead, addresses are loaded from the GOT.
NII51016 2015.04.02 Procedure Linkage Table 7-21 The example below shows the PLT entry when the PLT GOT is close enough to the small data area for a relative jump. Example 7-22: PLT Entry Near Small Data Area .PLTn: ldw jmp r15, %gprel(plt_got_slot_address)(gp) r15 Example 7-23: Initial PLT Entry res_0: br .PLTresolve ... .
7-22 NII51016 2015.04.02 Linux Program Interpreter Example 7-25: Initial PLT Entry .PLTresolve: nextpc orhi add ldw ldw jmp r14 r13, r13, r14, r13, r13 r0, %hiadj(_GLOBAL_OFFSET_TABLE_) r13, r14 %lo(_GLOBAL_OFFSET_TABLE_+4)(r13) %lo(_GLOBAL_OFFSET_TABLE_+8)(r13) If the initial PLT entry is out of range, the resolver can be inline, because it is only one instruction longer than a long branch, as shown below. Example 7-26: Initial PLT Entry Out of Range .
NII51016 2015.04.02 Processor Requirements 7-23 provides intrinsic functions which perform the system call. Applications must use those functions rather than the system call directly. Atomic operations may be added in a future processor extension. Processor Requirements Linux requires that a hardware multiplier be present. The full 64-bit multiplier (mulx instructions) is not required.
7-24 NII51016 2015.04.02 Document Revision History Date Version Changes May 2008 8.0.0 • Frame pointer description updated. • Relocation table added. October 2007 7.2.0 Maintenance release. May 2007 7.1.0 • Added table of contents to Introduction section. • Added Referenced Documents section. March 2007 7.0.0 Maintenance release. November 2006 6.1.0 Maintenance release. May 2006 6.0.0 Maintenance release. October 2005 5.1.0 Maintenance release. May 2005 5.0.
8 Instruction Set Reference 2015.04.02 NII51017 Send Feedback Subscribe This section introduces the Nios II instruction word format and provides a detailed reference of the Nios II instruction set. ® Word Formats There are three types of Nios II instruction word format: I-type, R-type, and J-type. I-Type The defining characteristic of the I-type instruction word format is that it contains an immediate value embedded within the instruction word.
8-2 NII51017 2015.04.02 J-Type • A 6-bit opcode field OP • Three 5-bit register fields A, B, and C • An 11-bit opcode-extension field OPX In most cases, fields A and B specify the source operands, and field C specifies the destination register. Some R-Type instructions embed a small immediate value in the five low-order bits of OPX. Unused bits in OPX are always 0.
NII51017 2015.04.
8-4 NII51017 2015.04.02 Assembler Pseudo-Instructions Assembler Pseudo-Instructions Pseudo-instructions are used in assembly source code like regular assembly instructions. Each pseudoinstruction is implemented at the machine level using an equivalent instruction. The movia pseudoinstruction is the only exception, being implemented with two instructions. Most pseudo-instructions do not appear in disassembly views of machine code.
NII51017 2015.04.02 Assembler Macros 8-5 Assembler Macros The Nios II assembler provides macros to extract halfwords from labels and from 32-bit immediate values. These macros return 16-bit signed values or 16-bit unsigned values depending on where they are used. When used with an instruction that requires a 16-bit signed immediate value, these macros return a value ranging from –32768 to 32767.
8-6 NII51017 2015.04.
NII51017 2015.04.02 add Example 8-7 add r6, r7, r8 Description Calculates the sum of rA and rB. Stores the result in rC. Used for both signed and unsigned addition. Usage Carry Detection (unsigned operands): Following an add operation, a carry out of the MSB can be detected by checking whether the unsigned sum is less than one of the unsigned operands. The carry bit can be written to a register, or a conditional branch can be taken based on the carry condition.
8-8 NII51017 2015.04.
NII51017 2015.04.02 addi Usage 8-9 Carry Detection (unsigned operands): Following an addi operation, a carry out of the MSB can be detected by checking whether the unsigned sum is less than one of the unsigned operands. The carry bit can be written to a register, or a conditional branch can be taken based on the carry condition.
8-10 NII51017 2015.04.02 and Bit Fields 31 30 29 28 27 26 25 24 A 15 14 13 23 22 21 20 19 B 12 11 10 9 8 18 17 16 1 0 IMM16 7 6 5 4 3 IMM16 2 0x04 and Instruction bitwise logical and Operation rC ← rA & rB Assembler Syntax and rC, rA, rB Example and r6, r7, r8 Description Calculates the bitwise logical AND of rA and rB and stores the result in rC.
NII51017 2015.04.02 andi 8-11 Description Calculates the bitwise logical AND of rA and (IMM16 : 0x0000) and stores the result in rB.
8-12 NII51017 2015.04.02 beq beq Instruction branch if equal Operation if (rA == rB) then PC ← PC + 4 + σ(IMM16) else PC ← PC + 4 Assembler Syntax beq rA, rB, label Example beq r6, r7, label Description If rA == rB, then beq transfers program control to the instruc‐ tion at label. In the instruction encoding, the offset given by IMM16 is treated as a signed number of bytes relative to the instruction immediately following beq.
NII51017 2015.04.02 bgeu 8-13 Description If (signed) rA >= (signed) rB, then bge transfers program control to the instruction at label. In the instruction encoding, the offset given by IMM16 is treated as a signed number of bytes relative to the instruction immediately following bge. The two least-significant bits of IMM16 are always zero, because instruction addresses must be word-aligned.
8-14 NII51017 2015.04.
NII51017 2015.04.02 ble Pseudo-instruction 8-15 bgtu is implemented with the bltu instruction by swapping the register operands. ble Instruction branch if less than or equal signed Operation if ((signed) rA <= (signed) rB) then PC ← label else PC ← PC + 4 Assembler Syntax ble rA, rB, label Example ble r6, r7, top_of_loop Description If (signed) rA <= (signed) rB, then ble transfers program control to the instruction at label.
8-16 NII51017 2015.04.02 bltu Assembler Syntax blt rA, rB, label Example blt r6, r7, top_of_loop Description If (signed) rA < (signed) rB, then blt transfers program control to the instruction at label. In the instruction encoding, the offset given by IMM16 is treated as a signed number of bytes relative to the instruction immediately following blt. The two least-significant bits of IMM16 are always zero, because instruction addresses must be word-aligned.
NII51017 2015.04.
8-18 NII51017 2015.04.02 br Bit Fields IMM16 0x1e br Instruction unconditional branch Operation PC ← PC + 4 + σ(IMM16) Assembler Syntax br label Example br top_of_loop Description Transfers program control to the instruction at label. In the instruction encoding, the offset given by IMM16 is treated as a signed number of bytes relative to the instruction immediately following br. The two least-significant bits of IMM16 are always zero, because instruction addresses must be wordaligned.
NII51017 2015.04.02 bret Example 8-19 break Description Breaks program execution and transfers control to the debugger break-processing routine. Saves the address of the next instruction in register ba and saves the contents of the status register in bstatus. Disables interrupts, then transfers execution to the break handler. The 5-bit immediate field imm5 is ignored by the processor, but it can be used by the debugger. break with no argument is the same as break 0.
8-20 NII51017 2015.04.02 call Usage bret is used by debuggers exclusively and should not appear in user programs, operating systems, or exception handlers. Exceptions Misaligned destination address Supervisor-only instruction Instruction Type R Instruction Fields None Bit Fields 31 30 29 28 27 26 25 24 0x1e 15 14 13 23 22 21 20 0 12 11 10 9 8 0x09 19 18 17 0x1e 7 6 5 4 3 0 16 0x09 2 1 0 0x3a call Instruction call subroutine Operation ra ← PC + 4 PC ← (PC31.
NII51017 2015.04.02 callr 8-21 1 0 Bit Fields 15 14 13 12 11 10 9 8 7 6 5 4 3 IMM26 2 0 callr Instruction call subroutine in register Operation ra ← PC + 4 PC ← rA Assembler Syntax callr rA Example callr r6 Description Saves the address of the next instruction in the return address register, and transfers execution to the address contained in register rA. Usage callr is used to dereference C-language function pointers.
8-22 NII51017 2015.04.02 cmpeqi Description If rA == rB, then stores 1 to rC; otherwise, stores 0 to rC. Usage cmpeq performs the == operation of the C programming language. Also, cmpeq can be used to implement the C logical negation operator “!”.
NII51017 2015.04.
8-24 NII51017 2015.04.02 cmpgei cmpgei Instruction compare greater than or equal signed immediate Operation if ((signed) rA >= (signed) σ(IMM16)) then rB ← 1 else rB ← 0 Assembler Syntax cmpgei rB, rA, IMM16 Example cmpgei r6, r7, 100 Description Sign-extends the 16-bit immediate value IMM16 to 32 bits and compares it to the value of rA. If rA >= σ(IMM16), then cmpgei stores 1 to rB; otherwise stores 0 to rB. Usage cmpgei performs the signed >= operation of the C program‐ ming language.
NII51017 2015.04.02 cmpgeui 8-25 Description If rA >= rB, then stores 1 to rC; otherwise stores 0 to rC. Usage cmpgeu performs the unsigned >= operation of the C program‐ ming language.
8-26 NII51017 2015.04.
NII51017 2015.04.02 cmpgtu 8-27 Description Sign-extends the immediate value IMMED to 32 bits and compares it to the value of rA. If rA > σ(IMMED), then cmpgti stores 1 to rB; otherwise stores 0 to rB. Usage cmpgti performs the signed > operation of the C programming language. The maximum allowed value of IMMED is 32766. The minimum allowed value is –32769. Pseudo-instruction cmpgti is implemented using a cmpgei instruction with an IMM16 immediate value of IMMED + 1.
8-28 NII51017 2015.04.02 cmple Usage cmpgtui performs the unsigned > operation of the C program‐ ming language. The maximum allowed value of IMMED is 65534. The minimum allowed value is 0. Pseudo-instruction cmpgtui is implemented using a cmpgeui instruction with an IMM16 immediate value of IMMED + 1.
NII51017 2015.04.02 cmpleu Pseudo-instruction 8-29 cmplei is implemented using a cmplti instruction with an IMM16 immediate value of IMMED + 1. cmpleu Instruction compare less than or equal unsigned Operation if ((unsigned) rA < (unsigned) rB) then rC ← 1 else rC ← 0 Assembler Syntax cmpleu rC, rA, rB Example cmpleu r6, r7, r8 Description If rA <= rB, then stores 1 to rC; otherwise stores 0 to rC. Usage cmpleu performs the unsigned <= operation of the C program‐ ming language.
8-30 NII51017 2015.04.02 cmplt cmplt Instruction compare less than signed Operation if ((signed) rA < (signed) rB) then rC ← 1 else rC ← 0 Assembler Syntax cmplt rC, rA, rB Example cmplt r6, r7, r8 Description If rA < rB, then stores 1 to rC; otherwise stores 0 to rC. Usage cmplt performs the signed < operation of the C programming language.
NII51017 2015.04.02 cmpltu 8-31 Description Sign-extends the 16-bit immediate value IMM16 to 32 bits and compares it to the value of rA. If rA < σ(IMM16), then cmplti stores 1 to rB; otherwise stores 0 to rB. Usage cmplti performs the signed < operation of the C programming language.
8-32 NII51017 2015.04.
NII51017 2015.04.02 cmpne 8-33 cmpne Instruction compare not equal Operation if (rA != rB) then rC ← 1 else rC ← 0 Assembler Syntax cmpne rC, rA, rB Example cmpne r6, r7, r8 Description If rA != rB, then stores 1 to rC; otherwise stores 0 to rC. Usage cmpne performs the != operation of the C programming language.
8-34 NII51017 2015.04.02 custom Description Sign-extends the 16-bit immediate value IMM16 to 32 bits and compares it to the value of rA. If rA != σ(IMM16), then cmpnei stores 1 to rB; otherwise stores 0 to rB. Usage cmpnei performs the != operation of the C programming language.
NII51017 2015.04.02 div 8-35 Usage To access a custom register inside the custom instruction logic, clear the bit readra, readrb, or writerc that corresponds to the register field. In assembler syntax, the notation cN refers to register N in the custom register file and causes the assembler to clear the c bit of the opcode. For example, custom 0, c3, r5, r0 performs custom instruction 0, operating on general-purpose registers r5 and r0, and stores the result in custom register 3.
8-36 NII51017 2015.04.02 divu Description Treating rA and rB as signed integers, this instruction divides rA by rB and then stores the integer portion of the resulting quotient to rC. After attempted division by zero, the value of rC is undefined. There is no divide-by-zero exception. After dividing –2147483648 by –1, the value of rC is undefined (the number +2147483648 is not representable in 32 bits). There is no overflow exception.
NII51017 2015.04.02 eret Description 8-37 Treating rA and rB as unsigned integers, this instruction divides rA by rB and then stores the integer portion of the resulting quotient to rC. After attempted division by zero, the value of rC is undefined. There is no divide-by-zero exception. Nios II processors that do not implement the divu instruction cause an unimplemented instruction exception.
8-38 NII51017 2015.04.02 flushd Description Copies the value of estatus into the status register, and transfers execution to the address in ea. Usage Use eret to return from traps, external interrupts, and other exception handling routines. Note that before returning from hardware interrupt exceptions, the exception handler must adjust the ea register.
NII51017 2015.04.02 flushd Description 8-39 If the Nios II processor implements a direct mapped data cache, flushd writes the data cache line that is mapped to the specified address back to memory if the line is dirty, and then clears the data cache line. Unlike flushda, flushd writes the dirty data back to memory even when the addressed data is not currently in the cache.
8-40 NII51017 2015.04.
NII51017 2015.04.02 flushi Usage 8-41 Use flushda to write dirty lines back to memory only if the addressed memory location is currently in the cache, and then flush the cache line. By contrast, refer to “flushd flush data cache line”, “initd initialize data cache line”, and “initda initialize data cache address” for other cache-clearing options. For more information on the Nios II data cache, refer to the Cache and Tightly Coupled Memory chapter of the Nios II Software Developer’s Handbook.
8-42 NII51017 2015.04.02 flushp Description Ignoring the tag, flushi identifies the instruction cache line associated with the byte address in rA, and invalidates that line. If the Nios II processor core does not have an instruction cache, the flushi instruction performs no operation. For more information about the data cache, refer to the Cache and Tightly Coupled Memory chapter of the Nios II Software Developer’s Handbook.
NII51017 2015.04.02 initd 8-43 17 16 Bit Fields 31 30 29 28 27 26 25 A 15 14 13 0x04 24 23 22 21 20 0 12 11 10 9 8 19 18 0 7 6 0 5 4 3 0x04 2 1 0 0x3a initd Instruction initialize data cache line Operation Initializes the data cache line associated with address rA + σ(IMM16).
8-44 NII51017 2015.04.02 initda Usage Use initd after processor reset and before accessing data memory to initialize the processor’s data cache. Use initd with caution because it does not write back dirty data. By contrast, refer to “flushd flush data cache line”, “flushda flush data cache address”, and “initda initialize data cache address” for other cache-clearing options. Altera recommends using initd only when the processor comes out of reset.
NII51017 2015.04.02 initda Description 8-45 If the Nios II processor implements a direct mapped data cache, initda clears the data cache line without checking for (or writing) a dirty data cache line that is mapped to the specified address back to memory. Unlike initd, initda clears the cache line only when the addressed data is currently cached. This process comprises the following steps: • Compute the effective address specified by the sum of rA and the signed 16-bit immediate value.
8-46 NII51017 2015.04.02 initi Bit Fields 31 30 29 28 27 26 25 24 A 15 14 13 23 22 21 20 19 0 12 11 10 9 18 17 16 1 0 IMM16 8 7 6 5 4 3 IMM16 2 0x13 Related Information Cache and Tightly-Coupled Memory flushda on page 8-40 initd on page 8-43 flushd on page 8-38 • • • • initi Instruction initialize instruction cache line Operation Initializes the instruction cache line associated with address rA.
NII51017 2015.04.02 jmp 8-47 Bit Fields 0x29 0 0x3a Related Information Cache and Tightly-Coupled Memory jmp Instruction computed jump Operation PC ← rA Assembler Syntax jmp rA Example jmp r12 Description Transfers execution to the address contained in register rA. Usage It is illegal to jump to the address contained in register r31. To return from subroutines called by call or callr, use ret instead of jmp.
8-48 NII51017 2015.04.02 ldb / ldbio Usage jmpi is a low-overhead local jump. jmpi can transfer execution anywhere within the 256-MB range determined by PC31..28. The Nios II GNU linker does not automatically handle cases in which the address is out of this range.
NII51017 2015.04.
8-50 NII51017 2015.04.02 ldbu / ldbuio Description Computes the effective byte address specified by the sum of rA and the instruction's signed 16-bit immediate value. Loads register rB with the desired memory byte, zero extending the 8bit value to 32 bits. Usage In processors with a data cache, this instruction may retrieve the desired data from the cache instead of from memory. Use the ldbuio instruction for peripheral I/O.
NII51017 2015.04.02 ldh / ldhio 8-51 Bit Fields IMM16 0x23 Related Information Cache and Tightly-Coupled Memory ldh / ldhio Instruction load halfword from memory or I/O peripheral Operation rB ← σ(Mem16[rA + σ(IMM16)]) Assembler Syntax ldh rB, byte_offset(rA) ldhio rB, byte_offset(rA) Example ldh r6, 100(r5) Description Computes the effective byte address specified by the sum of rA and the instruction's signed 16-bit immediate value.
8-52 NII51017 2015.04.
NII51017 2015.04.
8-54 NII51017 2015.04.02 ldw / ldwio Description Computes the effective byte address specified by the sum of rA and the instruction's signed 16-bit immediate value. Loads register rB with the memory word located at the effective byte address. The effective byte address must be word aligned. If the byte address is not a multiple of 4, the operation is undefined. Usage In processors with a data cache, this instruction may retrieve the desired data from the cache instead of from memory.
NII51017 2015.04.02 mov 8-55 1 0 Bit Fields 15 14 13 12 11 10 9 8 7 6 5 4 3 IMM16 2 0x37 Related Information Cache and Tightly-Coupled Memory mov Instruction move register to register Operation rC ← rA Assembler Syntax mov rC, rA Example mov r6, r7 Description Moves the contents of rA to rC. Pseudo-instruction mov is implemented as add rC, rA, r0.
8-56 NII51017 2015.04.02 movi Usage The maximum allowed value of IMMED is 65535. The minimum allowed value is 0. To load a 32-bit constant into a register, first load the upper 16 bits using a movhi pseudoinstruction. The %hi() macro can be used to extract the upper 16 bits of a constant or a label. Then, load the lower 16 bits with an ori instruction.
NII51017 2015.04.02 movui Example movia r6, function_address Description Writes the address of label to rB. Pseudo-instruction movia is implemented as: 8-57 orhi rB, r0, %hiadj(label) addi rB, rB, %lo(label) movui Instruction move unsigned immediate into word Operation rB ← (0x0000 : IMMED) Assembler Syntax movui rB, IMMED Example movui r6, 100 Description Zero-extends the immediate value IMMED to 32 bits and writes it to rB. Usage The maximum allowed value of IMMED is 65535.
8-58 NII51017 2015.04.02 mul Usage Carry Detection (unsigned operands): Before or after the multiply operation, the carry out of the MSB of rC can be detected using the following instruction sequence: mul rC, rA, rB mulxuu rD, rA, rB cmpne rD, rD, r0 # The mul operation (optional) # rD is nonzero if carry occurred # rD is 1 if carry occurred, 0 if not The mulxuu instruction writes a nonzero value into rD if the multiplication of unsigned numbers generates a carry (unsigned overflow).
NII51017 2015.04.02 muli 8-59 Bit Fields A 15 14 13 B 12 11 10 9 8 0x27 C 7 6 5 4 3 0 0x27 2 1 0 0x3a muli Instruction multiply immediate Operation rB ← (rA x σ(IMM16)) 31..0 Assembler Syntax muli rB, rA, IMM16 Example muli r6, r7, -100 Description Sign-extends the 16-bit immediate value IMM16 to 32 bits and multiplies it by the value of rA. Stores the 32 low-order bits of the product to rB. The result is independent of whether rA is treated as a signed or unsigned number.
8-60 NII51017 2015.04.02 mulxsu Assembler Syntax mulxss rC, rA, rB Example mulxss r6, r7, r8 Description Treating rA and rB as signed integers, mulxss multiplies rA times rB, and stores the 32 high-order bits of the product to rC. Nios II processors that do not implement the mulxss instruc‐ tion cause an unimplemented instruction exception. Usage Use mulxss and mul to compute the full 64-bit product of two 32-bit signed integers.
NII51017 2015.04.02 8-61 mulxuu Description Treating rA as a signed integer and rB as an unsigned integer, mulxsu multiplies rA times rB, and stores the 32 high-order bits of the product to rC. Nios II processors that do not implement the mulxsu instruc‐ tion cause an unimplemented instruction exception. Usage mulxsu can be used as part of the calculation of a 128-bit product of two 64-bit signed integers.
8-62 NII51017 2015.04.02 nextpc Usage Use mulxuu and mul to compute the 64-bit product of two 32bit unsigned integers. Furthermore, mulxuu can be used as part of the calculation of a 128-bit product of two 64-bit signed integers. Given two 64-bit signed integers, each contained in a pair of 32-bit registers, (S1 : U1) and (S2 : U2), their 128-bit product is (U1 x U2) + ((S1 x U2) << 32) + ((U1 x S2) << 32) + ((S1 x S2) << 64).
NII51017 2015.04.02 Exceptions None Instruction Type R Instruction Fields C = Register index of operand rC nop 8-63 17 16 Bit Fields 31 30 29 28 27 26 25 24 0 15 14 13 23 22 21 20 0 12 11 10 9 8 0x1c 19 18 C 7 6 5 4 3 0 0x1c 2 1 0 0x3a nop Instruction no operation Operation None Assembler Syntax nop Example nop Description nop does nothing. Pseudo-instruction nop is implemented as add r0, r0, r0.
8-64 NII51017 2015.04.02 or Instruction Fields A = Register index of operand rA B = Register index of operand rB C = Register index of operand rC 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 A B C 0x06 8 7 6 5 0 4 3 2 1 0 0x3a or Instruction bitwise logical or Operation rC ← rA | rB Assembler Syntax or rC, rA, rB Example or r6, r7, r8 Description Calculates the bitwise logical OR of rA and rB and stores the result in rC.
NII51017 2015.04.02 ori 8-65 Description Calculates the bitwise logical OR of rA and (IMM16 : 0x0000) and stores the result in rB.
8-66 NII51017 2015.04.02 rdctl rdctl Instruction read from control register Operation rC ← ctlN Assembler Syntax rdctl rC, ctlN Example rdctl r3, ctl31 Description Reads the value contained in control register ctlN and writes it to register rC.
NII51017 2015.04.02 ret Usage 8-67 The previous register set is specified by status.PRS. By default, status.PRS indicates the register set in use before an exception, such as an external interrupt, caused a register set change. To read from an arbitrary register set, software can insert the desired register set number in status.PRS prior to executing rdprs. If shadow register sets are not implemented on the Nios II core, rdprs is an illegal instruction.
8-68 NII51017 2015.04.02 rol Bit Fields 31 30 29 28 27 26 25 24 0x1f 15 14 13 23 22 21 20 0 12 11 10 9 18 17 0 8 0x05 19 7 6 5 4 3 0 16 0x05 2 1 0 0x3a rol Instruction rotate left Operation rC ← rA rotated left rB4..0 bit positions Assembler Syntax rol rC, rA, rB Example rol r6, r7, r8 Description Rotates rA left by the number of bits specified in rB4..0 and stores the result in rC.
NII51017 2015.04.02 ror Description Rotates rA left by the number of bits specified in IMM5 and stores the result in rC. The bits that shift out of the register rotate into the least-significant bit positions. Usage In addition to the rotate-left operation, roli can be used to implement a rotate-right operation. Rotating left by (32 – IMM5) bits is the equivalent of rotating right by IMM5 bits.
8-70 NII51017 2015.04.02 sll Bit Fields 31 30 29 28 27 26 25 24 A 15 14 13 23 22 21 20 B 12 11 10 9 8 0x0b 19 18 17 C 7 6 5 4 3 0 16 0x0b 2 1 0 0x3a sll Instruction shift left logical Operation rC ← rA << (rB4..0) Assembler Syntax sll rC, rA, rB Example sll r6, r7, r8 Description Shifts rA left by the number of bits specified in rB4..0 (inserting zeroes), and then stores the result in rC. sll performs the << operation of the C programming language.
NII51017 2015.04.02 sra Description Shifts rA left by the number of bits specified in IMM5 (inserting zeroes), and then stores the result in rC. Usage slli performs the << operation of the C programming language.
8-72 NII51017 2015.04.02 srai Bit Fields 31 30 29 28 27 26 25 24 A 15 14 13 23 22 21 20 B 12 11 10 9 18 17 C 8 0x3b 19 7 6 5 4 3 0 16 0x3b 2 1 0 0x3a srai Instruction shift right arithmetic immediate Operation rC ← (signed) rA >> ((unsigned) IMM5) Assembler Syntax srai rC, rA, IMM5 Example srai r6, r7, 3 Description Shifts rA right by the number of bits specified in IMM5 (duplicating the sign bit), and then stores the result in rC.
NII51017 2015.04.02 srli Example 8-73 srl r6, r7, r8 Description Shifts rA right by the number of bits specified in rB4..0 (inserting zeroes), and then stores the result in rC. Bits 31–5 are ignored. Usage srl performs the unsigned >> operation of the C programming language.
8-74 NII51017 2015.04.02 stb / stbio l Instruction Fields A = Register index of operand rA C = Register index of operand rC IMM5 = 5-bit unsigned immediate value Bit Fields 31 30 29 28 27 26 25 24 A 15 14 13 23 22 21 20 B 12 11 10 9 8 0x1a 19 18 17 C 7 6 5 4 IMM5 3 16 0x1a 2 1 0 0x3a stb / stbio l Instruction store byte to memory or I/O periphera Operation Mem8[rA + σ(IMM16)] ← rB7..
NII51017 2015.04.
8-76 NII51017 2015.04.
NII51017 2015.04.02 stw / stwio 8-77 Description Computes the effective byte address specified by the sum of rA and the instruction's signed 16-bit immediate value. Stores rB to the memory location specified by the effective byte address. The effective byte address must be word aligned. If the byte address is not a multiple of 4, the operation is undefined. Usage In processors with a data cache, this instruction may not generate an Avalon-MM data transfer immediately.
8-78 NII51017 2015.04.02 sub sub Instruction subtract Operation rC ← rA – rB Assembler Syntax sub rC, rA, rB Example sub r6, r7, r8 Description Subtract rB from rA and store the result in rC.
NII51017 2015.04.02 sub Usage 8-79 Carry Detection (unsigned operands): The carry bit indicates an unsigned overflow. Before or after a sub operation, a carry out of the MSB can be detected by checking whether the first operand is less than the second operand. The carry bit can be written to a register, or a conditional branch can be taken based on the carry condition.
8-80 NII51017 2015.04.
NII51017 2015.04.
8-82 NII51017 2015.04.02 wrctl Bit Fields 31 30 29 28 27 26 25 24 0 15 14 13 23 22 21 20 0 12 11 10 9 18 17 0x1d 8 0x2d 19 7 6 5 4 3 IMM5 16 0x2d 2 1 0 0x3a wrctl Instruction write to control register Operation ctlN ← rA Assembler Syntax wrctl ctlN, rA Example wrctl ctl6, r3 Description Writes the value contained in register rA to the control register ctlN.
NII51017 2015.04.02 xor Usage 8-83 The previous register set is specified by status.PRS. By default, status.PRS indicates the register set in use before an exception, such as an external interrupt, caused a register set change. To write to an arbitrary register set, software can insert the desired register set number in status.PRS prior to executing wrprs. System software must use wrprs to initialize r0 to 0 in each shadow register set before using that register set.
8-84 NII51017 2015.04.
NII51017 2015.04.02 Document Revision History Assembler Syntax xori rB, rA, IMM16 Example xori r6, r7, 100 8-85 Description Calculates the bitwise logical exclusive OR of rA and (0x0000 : IMM16) and stores the result in rB.
8-86 NII51017 2015.04.02 Document Revision History Date Version Changes October 2007 7.2.0 Added jmpi instruction. May 2007 7.1.0 • Added table of contents to Introduction section. • Added Referenced Documents section. March 2007 7.0.0 Maintenance release. November 2006 6.1.0 Maintenance release. May 2006 6.0.0 Maintenance release. October 2005 5.1.0 • Correction to the blt instruction. • Added U bit operation for break and trap instructions. July 2005 5.0.