User manual

ManualsBrandsIntel ManualsOtherPersonal Computer IXP2800

Table Of Contents

Intel® IXP2800 Network Processor

Order Number: 278882-010

Intel

IXP2800 Network

Processor

Hardware Reference Manual

August 2004

Summary of content (430 pages)

PAGE 1
Intel® IXP2800 Network Processor Hardware Reference Manual August 2004 Order Number: 278882-010
PAGE 2
Revision History Date Revision March 2002 001 Description First release for IXP2800 Customer Information Book V 0.4 May 2002 002 Update for the IXA SDK 3.0 release. August 2002 003 Update for the IXA SDK 3.0 Pre-Release 4. November 2002 004 Update for the IXA SDK 3.0 Pre-Release 5. May 2003 005 Update for the IXA SDK 3.1 Alpha Release September 2003 006 Update for the IXA SDK 3.
PAGE 3
Contents Contents 1 Introduction.................................................................................................................................. 25 1.1 1.2 1.3 2 About This Document .........................................................................................................25 Related Documentation ...................................................................................................... 25 Terminology ............................................................
PAGE 4
Contents 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 3 Intel XScale® Core ....................................................................................................................... 79 3.1 3.2 4 Scratchpad Memory............................................................................................................ 56 2.6.1 Scratchpad Atomic Operations .............................................................................. 57 2.6.2 Ring Commands ........................................
PAGE 5
Contents 3.3 3.4 3.5 3.6 3.2.7 Power Management............................................................................................... 81 3.2.8 Debugging ............................................................................................................. 81 3.2.9 JTAG......................................................................................................................81 Memory Management....................................................................................
PAGE 6
Contents 3.7 3.8 3.9 3.10 3.11 6 3.6.2.3.4 Write-Back versus Write-Through........................................ 101 3.6.2.4 Round-Robin Replacement Algorithm ................................................. 102 3.6.2.5 Parity Protection................................................................................... 102 3.6.2.6 Atomic Accesses.................................................................................. 102 3.6.3 Data Cache and Mini-Data Cache Control .......................
PAGE 7
Contents 3.12 4 3.11.5 I/O Transaction ....................................................................................................130 3.11.6 Hash Access ........................................................................................................130 3.11.7 Gasket Local CSR ...............................................................................................131 3.11.8 Interrupt ...................................................................................................
PAGE 8
Contents 4.4 4.5 5 DRAM.......................................................................................................................................... 187 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6 Overview........................................................................................................................... 187 Size Configuration ............................................................................................................ 188 DRAM Clocking ....
PAGE 9
Contents 6.3 6.4 6.5 6.6 6.7 6.8 7 SHaC — Unit Expansion ...........................................................................................................225 7.1 8 6.2.1 Internal Interface ..................................................................................................209 6.2.2 Number of Channels ............................................................................................209 6.2.3 Coprocessor and/or SRAMs Attached to a Channel.................................
PAGE 10
Contents 8.2.5 8.2.6 8.2.7 8.3 8.4 8.5 8.6 8.7 10 Rx_Thread_Freelist_Timeout_# .......................................................................... 256 Receive Operation Summary............................................................................... 256 Receive Flow Control Status ............................................................................... 258 8.2.7.1 SPI-4 .................................................................................................... 258 8.2.7.
PAGE 11
Contents 8.8 8.9 8.7.2.3 Single IXP2800 Network Processor .....................................................289 Interface to Command and Push and Pull Buses .............................................................290 8.8.1 RBUF or MSF CSR to Microengine S_TRANSFER_IN Register for Instruction:.291 8.8.2 Microengine S_TRANSFER_OUT Register to TBUF or MSF CSR for Instruction: .....................................................................................291 8.8.
PAGE 12
Contents 9 PCI Unit....................................................................................................................................... 319 9.1 9.2 9.3 9.4 12 Overview........................................................................................................................... 319 PCI Pin Protocol Interface Block....................................................................................... 321 9.2.1 PCI Commands .................................................
PAGE 13
Contents 9.4.2 9.5 9.6 10 Push/Pull Command Bus Target Interface...........................................................345 9.4.2.1 Command Bus Master Access to Local Configuration Registers ........345 9.4.2.2 Command Bus Master Access to Local Control and Status Registers...................................................................................346 9.4.2.3 Command Bus Master Direct Access to PCI Bus ................................346 9.4.2.3.1 PCI Address Generation for IO and MEM Cycles .
PAGE 14
Contents 10.4 10.5 11 Performance Monitor Unit ........................................................................................................ 375 11.1 11.2 11.3 11.4 14 10.3.2 PCI-Initiated Reset............................................................................................... 366 10.3.3 Watchdog Timer-Initiated Reset .......................................................................... 366 10.3.3.1 Slave Network Processor (Non-Central Function) ............................
PAGE 15
Contents 11.4.6.7 ME01 Events Target ID(100001) / Design Block #(1001) ....................410 11.4.6.8 ME02 Events Target ID(100010) / Design Block #(1001) ....................411 11.4.6.9 ME03 Events Target ID(100011) / Design Block #(1001) ....................411 11.4.6.10 ME04 Events Target ID(100100) / Design Block #(1001) ....................412 11.4.6.11 ME05 Events Target ID(100101) / Design Block #(1001) ....................412 11.4.6.12 ME06 Events Target ID(100110) / Design Block #(1001) .......
PAGE 16
Contents Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 16 IXP2800 Network Processor Functional Block Diagram ............................................................ 28 IXP2800 Network Processor Detailed Diagram.......................................................................... 29 Intel XScale® Core 4-GB (32-Bit) Address Space .....................................................................
PAGE 17
Contents 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 An Interface Topology with Intel / AMCC* SONET/SDH Device ..............................................158 Mode 3 Second Interface Topology with Intel / AMCC* SONET/SDH Device..........................159 Mode 3 Single Write Transfer Followed by Read (B0) .............................................................
PAGE 18
Contents 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 18 CSIX Flow Control Interface — FCIFIFO and FCEFIFO in Full Duplex Mode ......................... 277 CSIX Flow Control Interface — FCIFIFO and FCEFIFO in Simplex Mode .............................. 278 MSF to Command and Push and Pull Buses Interface Block Diagram ....................................
PAGE 19
Contents Tables 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Data Terminology ....................................................................................................................... 26 Longword Formats...................................................................................................................... 26 IXP2800 Network Processor Microengine Bus Arrangement ....................................
PAGE 20
Contents 47 Byte-Enable Generation by the Intel XScale® Core for Byte Writes in Little- and Big-Endian Systems ................................................................................................................. 123 48 Byte-Enable Generation by the Intel XScale® Core for Word Writes in Little- and Big-Endian Systems ................................................................................................................. 124 49 CMB Write Command to CPP Command Conversion ...............
PAGE 21
Contents 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 Order in which Data is Transmitted from TBUF........................................................................263 Mapping of TBUF Partitions to Transmit Protocol ....................................................................263 Number of Elements per TBUF Partition ........................................................
PAGE 22
Contents 138 Byte Enable Alignment for 64-Bit PCI Data In (64 Bits PCI Little-Endian to BigEndian with Swap).................................................................................................................... 355 139 Byte Enable Alignment for 64-Bit PCI Data In (64 Bits PCI Big-Endian to Big-Endian without Swap) ...........................................................................................................................
PAGE 23
Contents 181 182 183 184 185 186 SRAM CH0 PMU Event List .....................................................................................................422 IXP2800 Network Processor Dram DPLA PMU Event List.......................................................423 IXP2800 Network Processor Dram DPSA PMU Event List ......................................................424 IXP2800 Network Processor Dram CH2 PMU Event List.........................................................
PAGE 24
Contents 24 Hardware Reference Manual
PAGE 25
Intel® IXP2800 Network Processor Introduction Introduction 1.1 1 About This Document This document is the hardware reference manual for the Intel® IXP2800 Network Processor. This information is intended for use by developers and is organized as follows: Section 2, “Technical Description” contains a hardware overview. Section 3, “Intel XScale® Core” describes the embedded core. Section 4, “Microengines” describes Microengine operation. Section 5, “DRAM” describes the DRAM Unit.
PAGE 26
Intel® IXP2800 Network Processor Introduction 1.3 Terminology Table 1 and Table 2 list the terminology used in this manual. Table 1. Data Terminology Term Table 2.
PAGE 27
Intel® IXP2800 Network Processor Technical Description Technical Description 2.1 2 Overview This section provides a brief overview of the IXP2800 Network Processor internal hardware. This section is intended as an overall hardware introduction to the network processor. The major blocks are: • Intel XScale®core — General purpose 32-bit RISC processor (ARM* Version 5 Architecture compliant) used to initialize and manage the network processor, and can be used for higher layer network processing tasks.
PAGE 28
Intel® IXP2800 Network Processor Technical Description Figure 1.
PAGE 29
Intel® IXP2800 Network Processor Technical Description Figure 2.
PAGE 30
Intel® IXP2800 Network Processor Technical Description 2.2 Intel XScale® Core Microarchitecture The Intel XScale® microarchitecture consists of a 32-bit general purpose RISC processor that incorporates an extensive list of architecture features that allows it to achieve high performance. 2.2.1 ARM* Compatibility The Intel XScale® microarchitecture is ARM* Version 5 (V5) Architecture compliant.
PAGE 31
Intel® IXP2800 Network Processor Technical Description 2.2.2.4 Branch Target Buffer The Intel XScale® microarchitecture provides a Branch Target Buffer (BTB) to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch. The BTB holds 128 entries. 2.2.2.
PAGE 32
Intel® IXP2800 Network Processor Technical Description 2.2.2.7 Address Map Figure 3 shows the partitioning of the Intel XScale® core microarchitecture 4-Gbyte address space. Figure 3. Intel XScale® Core 4-GB (32-Bit) Address Space 0XFFFF FFF PCI MEM (1/2 Gb) 3.
PAGE 33
Intel® IXP2800 Network Processor Technical Description 2.3 Microengines The Microengines do most of the programmable pre-packet processing in the IXP2800 Network Processor. There are 16 Microengines, connected as shown in Figure 1. The Microengines have access to all shared resources (SRAM, DRAM, MSF, etc.) as well as private connections between adjacent Microengines (referred to as “next neighbors”). The block diagram in Figure 4 is used in the Microengine description.
PAGE 34
Intel® IXP2800 Network Processor Technical Description Figure 4.
PAGE 35
Intel® IXP2800 Network Processor Technical Description 2.3.1 Microengine Bus Arrangement The IXP2800 Network Processor supports a single D_Push/D_Pull bus, and both Microengine clusters interface to the same bus. Also, it supports two command buses, and two sets of S_Push/S_Pull buses connected as shown in Table 3, which also shows the next neighbor relationship between the Microengine. Table 3.
PAGE 36
Intel® IXP2800 Network Processor Technical Description Each of the eight Contexts is in one of four states. 1. Inactive — Some applications may not require all eight contexts. A Context is in the Inactive state when its CTX_ENABLE CSR enable bit is a 0. 2. Executing — A Context is in Executing state when its context number is in ACTIVE_CTX_STS CSR. The executing Context’s PC is used to fetch instructions from the Control Store.
PAGE 37
Intel® IXP2800 Network Processor Technical Description The Microengine provides the following functionality during the Idle state: 1. The Microengine continuously checks if a Context is in Ready state. If so, a new Context begins to execute. If no Context is Ready, the Microengine remains in the Idle state. 2. Only the ALU instructions are supported. They are used for debug via special hardware defined in number 3 below. 3.
PAGE 38
Intel® IXP2800 Network Processor Technical Description methods to write TRANSFER_IN registers, for example a read instruction executed by one Microengine may cause the data to be returned to a different Microengine. Details are covered in the instruction set descriptions). TRANSFER_OUT registers, when used as a destination in an instruction, are written with the result from the execution datapath. The specific register selected is encoded in the instruction, or selected indirectly via T_INDEX.
PAGE 39
Intel® IXP2800 Network Processor Technical Description 2.3.4.4 Local Memory Local Memory is addressable storage within the Microengine. Local Memory is read and written exclusively under program control. Local Memory supplies operands to the execution datapath as a source, and receives results as a destination. The specific Local Memory location selected is based on the value in one of the LM_ADDR registers, which are written by local_csr_wr instructions.
PAGE 40
Intel® IXP2800 Network Processor Technical Description As shown in Example 1, there is a latency in loading LM_ADDR. Until the new value is loaded, the old value is still usable. Example 5 shows the maximum pipelined usage of LM_ADDR. Example 5.
PAGE 41
Intel® IXP2800 Network Processor Technical Description In Example 8, the second instruction will access the Local Memory location one past the source/ destination of the first. Example 8. LM_ADDR Post-Increment alu[*l$index0++, --, ~B, gpr_n] alu[gpr_m, --, ~B, *l$index0] 2.3.5 Addressing Modes GPRs can be accessed in either a context-relative or an absolute addressing mode. Some instructions can specify either mode; other instructions can specify only Context-Relative mode.
PAGE 42
Intel® IXP2800 Network Processor Technical Description 2.3.5.2 Absolute Addressing Mode With Absolute addressing, any GPR can be read or written by any of the eight Contexts in a Microengine. Absolute addressing enables register data to be shared among all of the Contexts, e.g., for global variables or for parameter passing. All 256 GPRs can be read by Absolute address. 2.3.5.
PAGE 43
Intel® IXP2800 Network Processor Technical Description 2.3.6 Local CSRs Local Control and Status registers (CSRs) are external to the Execution Datapath, and hold specific data. They can be read and written by special instructions (local_csr_rd and local_csr_wr) and are accessed less frequently than datapath registers. Because Local CSRs are not built in the datapath, there is a write-to-use delay of three instructions, and a read-to-consume penalty of two instructions. 2.3.
PAGE 44
Intel® IXP2800 Network Processor Technical Description Figure 6. Byte-Align Block Diagram Prev_B Prev_A . . . . . . A_Operand B_Operand Shift Byte_Index Result A9353-01 Example 10 shows a big-endian align sequence of instructions and the value of the various operands. Table 7 shows the data in the registers for this example. The value in BYTE_INDEX[1:0] CSR (which controls the shift amount) for this example is 2. Table 7.
PAGE 45
Intel® IXP2800 Network Processor Technical Description Example 11 shows a little-endian sequence of instructions and the value of the various operands. Table 8 shows the data in the registers for this example. The value in BYTE_INDEX[1:0] CSR (which controls the shift amount) for this example is 2. Table 8. Register Contents for Example 11 Register Byte 3 [31:24] Byte 2 [23:16] Byte 1 [15:8] Byte 0 [7:0] 0 3 2 1 0 1 7 6 5 4 2 B A 9 8 3 F E D C Example 11.
PAGE 46
Intel® IXP2800 Network Processor Technical Description Figure 7. CAM Block Diagram Lookup Value (from A port) Tag State Tag State Tag State Tag State Match Match Match Status and LRU Logic Match Lookup Status (to Dest Req) State Status Entry Number 0000 Miss 0 LRU Entry State Hit 1 Hit Entry A9354-01 Note: The State bits are data associated with the entry. The use is only by software. There is no implication of ownership of the entry by any Context.
PAGE 47
Intel® IXP2800 Network Processor Technical Description The value in the State bits for an entry can be written, without modifying the Tag, by instruction: CAM_Write_State[entry_reg, state_value] Note: CAM_Write_State does not modify the LRU list. One possible way to use the result of a lookup is to dispatch to the proper code using instruction: jump[register, label#],defer [3] where the register holds the result of the lookup.
PAGE 48
Intel® IXP2800 Network Processor Technical Description An algorithm for debug software to find out the contents of the CAM is shown in Example 12. Example 12. Algorithm for Debug Software to Find out the Contents of the CAM ; First read each of the tag entries. Note that these reads ; don’t modify the LRU list or any other CAM state. tag[0] = CAM_Read_Tag(entry_0); ...... tag[15] = CAM_Read_Tag(entry_15); ; Now read each of the state bits state[0] = CAM_Read_State(entry_0); ...
PAGE 49
Intel® IXP2800 Network Processor Technical Description 2.3.9 Event Signals Event Signals are used to coordinate a program with completion of external events. For example, when a Microengine executes an instruction to an external unit to read data (which will be written into a Transfer_In register), the program must insure that it does not try to use the data until the external unit has written it.
PAGE 50
Intel® IXP2800 Network Processor Technical Description 2.4 DRAM The IXP2800 Network Processor has controllers for three Rambus* DRAM (RDRAM) channels. Each of the controllers independently accesses its own RDRAMs, and can operate concurrently with the other controllers (i.e., they are not operating as a single, wider memory). DRAM provides high-density, high-bandwidth storage and is typically used for data buffers.
PAGE 51
Intel® IXP2800 Network Processor Technical Description 2.4.2 Read and Write Access The minimum DRAM physical access length is 16 bytes. Software (and PCI) can read or write as little as a single byte, however the time (and bandwidth) taken at the DRAMs is the same as for an access of 16 bytes. Therefore, the best utilization of DRAM bandwidth will be for accesses that are multiples of 16 bytes.
PAGE 52
Intel® IXP2800 Network Processor Technical Description 2.5.1 QDR Clocking Scheme The controller drives out two pairs of K clock (K and K#). It also drives out two pairs of C clock (C and C#). Both C/C# clocks externally return to the controller for reading data. Figure 8 shows the clock diagram if the clocking scheme for QDR interface driving four SRAM chips. Figure 8.
PAGE 53
Intel® IXP2800 Network Processor Technical Description Table 10. SRAM Controller Configurations (Sheet 2 of 2) SRAM Configuration SRAM Size Addresses Needed to Index SRAM Addresses Used as Port Enables Total Number of Port Select Pairs Available 8M x 18 16 MB 21:0 23:22 3 16M x 18 32 MB 22:0 None 2 32M x 18 64 MB 23:0 None 2 Each channel can be expanded by depth according to the number of port enables available.
PAGE 54
Intel® IXP2800 Network Processor Technical Description 2.5.4 Queue Data Structure Commands The ability to enqueue and dequeue data buffers at a fast rate is key to meeting line-rate performance. This is a difficult problem as it involves dependent memory references that must be turned around very quickly. The SRAM controller includes a data structure (called the Q_array) and associated control logic to perform efficient enqueue and dequeue operations.
PAGE 55
Intel® IXP2800 Network Processor Technical Description Verification is required to test only the order rules shown in Table 12 and Table 13). Note: A blank entry in Table 12 means that no order is enforced. Table 12. Address Reference Order 1st ref 2nd ref Memory Read CSR Read Memory Read Memory Write CSR Write Memory RMW Queue / Ring / Q_Descr Commands Order CSR Read Order Memory Write Order CSR Write Order Memory RMW Order Queue / Ring / Q_ Descr Commands See Table 13.
PAGE 56
Intel® IXP2800 Network Processor Technical Description 2.5.5.2 Microengine Software Restrictions to Maintain Ordering It is the Microengine programmer’s job to ensure order where the program flow finds order to be necessary and where the architecture does not guarantee that order. The signaling mechanism can be used to do this. For example, say that microcode needs to update several locations in a table. A location in SRAM is used to “lock” access to the table. Example 13 is the code for the table update.
PAGE 57
Intel® IXP2800 Network Processor Technical Description 2.6.1 Scratchpad Atomic Operations In addition to normal reads and writes, the Scratchpad Memory supports the following atomic operations. Microengines have specific instructions to do each atomic operation; the Intel XScale® microarchitecture uses aliased address regions to do atomic operations.
PAGE 58
Intel® IXP2800 Network Processor Technical Description Head, Tail, and Size are registers in the Scratchpad Unit. Head and Tail point to the actual ring data, which is stored in the Scratchpad RAM. The count of how many entries are on the Ring is determined by hardware using the Head and Tail. For each Ring in use, a region of Scratchpad RAM must be reserved for the ring data. Note: The reservation is by software convention.
PAGE 59
Intel® IXP2800 Network Processor Technical Description 2.7 Media and Switch Fabric Interface The Media and Switch Fabric (MSF) Interface is used to connect the IXP2800 Network Processor to a physical layer device (PHY) and/or to a Switch Fabric. the MSF consists of separate receive and transmit interfaces. Each of the receive and transmit interfaces can be separately configured for either SPI-4 Phase 2 (System Packet Interface) for PHY devices or CSIX-L1 protocol for Switch Fabric Interfaces.
PAGE 60
Intel® IXP2800 Network Processor Technical Description An alternate system configuration is shown in the block diagram in Figure 11. In this case, a single IXP2800 Network Processor is used for both Ingress and Egress. The bit rate supported would be less than in Figure 10. A hypothetical Bus Converter chip, external to the IXP2800 Network Processor is used. The block diagram in Figure 11 is only an illustrative example. Figure 11.
PAGE 61
Intel® IXP2800 Network Processor Technical Description 2.7.2 CSIX CSIX-L1 (Common Switch Interface) defines an interface between a Traffic Manager (TM) and a Switch Fabric (SF) for ATM, IP, MPLS, Ethernet, and similar data communications applications. The Network Processor Forum (NPF) www.npforum.org, controls the CSIX-L1 specification. The basic unit of information transferred between Traffic Managers and Switch Fabrics is called a CFrame.
PAGE 62
Intel® IXP2800 Network Processor Technical Description 2.7.3.1 RBUF RBUF is a RAM that holds received data. It stores received data in sub-blocks (referred to as elements), and is accessed by Microengines or the Intel XScale® core reading the received information. Details of how RBUF elements are allocated and filled is based on the receive data protocol. When data is received, the associated status is put into the FULL_ELEMENT_LIST FIFO and subsequently sent to Microengines to process.
PAGE 63
Intel® IXP2800 Network Processor Technical Description 2.7.3.1.2 CSIX and RBUF CSIX CFrames are placed into either RBUF with each CFrame allocating an element. Unlike SPI-4, a single CFrame must not spill over into another element. Since CSIX spec specifies a maximum CFrame size of 256 bytes, this can be done by programming the element size to 256 bytes. However, if the Switch Fabric uses a smaller CFrame size, then a smaller RBUF element size can be used.
PAGE 64
Intel® IXP2800 Network Processor Technical Description Each RX_THREAD_FREELIST has an associated countdown timer. If the timer expires and no new receive data is available yet, the receive logic will autopush a Null Receive Status Word to the next thread on the RX_THREAD_FREELIST. A Null Receive Status Word has the “Null” bit set, and does not have any data or RBUF entry associated with it. The RX_THREAD_FREELIST timer is useful for certain applications.
PAGE 65
Intel® IXP2800 Network Processor Technical Description 2.7.4 Transmit Figure 13 is a simplified Block Diagram of the MSF transmit section. From ME From DRAM - TBUF - - - - - - - - - - - - - SPI-4 Protocol Logic - CSIX Protocol Logic Byte Align Figure 13. Simplified Transmit Section Block Diagram TDAT TCTL TPAR Control Valid Element Logic ME Reads (S_Push_Bus) From Other CSRs TCLK TCLK REF - RXCFC (FCIFIFO full) 2.7.4.
PAGE 66
Intel® IXP2800 Network Processor Technical Description All elements within a TBUF partition are transmitted in the order. Control information associated with the element defines which bytes are valid. The data from the TBUF will be shifted and byte aligned as required to be transmitted. 2.7.4.1.1 SPI-4 and TBUF For SPI-4, data is put into the data portion of the element, and information for the SPI-4 Control Word that will precede the data is put into the Element Control Word.
PAGE 67
Intel® IXP2800 Network Processor Technical Description 2.7.4.1.2 CSIX and TBUF For CSIX, payload information is put into the data area of the element, and Base and Extension Header information is put into the Element Control Word.
PAGE 68
Intel® IXP2800 Network Processor Technical Description There is a Transmit Valid bit per element, that marks the element as ready to be transmitted. Microengines move all data into the element, by either or both of msf[write] and dram[tbuf_wr] instructions to the TBUF. Microengines also write the element Transmit Control Word with information about the element. When all of the data movement is complete, the Microengine sets the element valid bit. 1.
PAGE 69
Intel® IXP2800 Network Processor Technical Description 2.8 Hash Unit The IXP2800 Network Processor contains a Hash Unit that can take 48-, 64-, or 128-bit data and produce a 48-, 64-, or a 128-bit hash index, respectively. The Hash Unit is accessible by the Microengines and the Intel XScale® core, and is useful in doing table searches with large keys, for example L2 addresses. Figure 14 is a block diagram of the Hash Unit. Up to three hash indexes can be created using a single Microengine instruction.
PAGE 70
Intel® IXP2800 Network Processor Technical Description Figure 14.
PAGE 71
Intel® IXP2800 Network Processor Technical Description 2.9 PCI Controller The PCI Controller provides a 64-bit, 66 MHz capable PCI Local Bus Revision 2.2 interface, and is compatible to 32-bit or 33 MHz PCI devices.
PAGE 72
Intel® IXP2800 Network Processor Technical Description For PCI to DRAM transfers, the PCI command is Memory Read, Memory Read line, or Memory Read Multiple. For DRAM to PCI transfers, the PCI command is Memory Write. Memory Write Invalidate is not supported. Up to two DMA channels are running at a time with three descriptors outstanding. Effectively, the active channels interleave bursts to or from the PCI Bus. Interrupts are generated at the end of DMA operation for the Intel XScale® core.
PAGE 73
Intel® IXP2800 Network Processor Technical Description 2.9.3.2 DMA Channel Operation The DMA channel can be set up to read the first descriptor in SRAM, or with the first descriptor written directly to the DMA channel registers. When descriptors and the descriptor list are in SRAM, the procedure is as follows: 1. The DMA channel owner writes the address of the first descriptor into the DMA Channel Descriptor Pointer register (DESC_PTR). 2.
PAGE 74
Intel® IXP2800 Network Processor Technical Description 2.9.3.3 DMA Channel End Operation 1. Channel owned by PCI: If not masked via the PCI Outbound Interrupt Mask register, the DMA channel interrupts the PCI host after the setting of the DMA done bit in the CHAN_X_CONTROL register, which is readable in the PCI Outbound Interrupt Status register. 2.
PAGE 75
Intel® IXP2800 Network Processor Technical Description (either a PCI interrupt or an Intel XScale® core interrupt). When an interrupt is received, the DOORBELL registers can be read and the bit mask can be interpreted. If a larger bit mask is required than that is provided by the DOORBELL register, the MAILBOX registers can be used to pass up to 16 bytes of data. The doorbell interrupts are controlled through the registers shown in Table 18. Table 18. Doorbell Interrupt Registers Register Name 2.9.
PAGE 76
Intel® IXP2800 Network Processor Technical Description 2.10 Control and Status Register Access Proxy The Control and Status Register Access Proxy (CAP) contains a number of chip-wide control and status registers. Some provide miscellaneous control and status, while others are used for interMicroengine or Microengine to the Intel XScale® core communication (note that rings in Scratchpad Memory and SRAM can also be used for inter-process communication).
PAGE 77
Intel® IXP2800 Network Processor Technical Description 2.11.2 Timers The IXP2800 Network Processor contains four programmable 32-bit timers, which can be used for software support. Each timer can be clocked by the internal clock, by a divided version of the clock, or by a signal on an external GPIO pin. Each timer can be programmed to generate a periodic interrupt after a programmed number of clocks. The range is from several ns to several minutes depending on the clock frequency.
PAGE 78
Intel® IXP2800 Network Processor Technical Description The access is asynchronous. Insertion of delay cycles for both data setup and hold time is programmable via internal Control registers. The transfer can also wait for a handshake acknowledge signal from the external device. 2.12 I/O Latency Table 19 shows the latencies for transferring data between the Microengine and the other subsystem components. The latency is measured in 1.4 GHz cycles. Table 19.
PAGE 79
Intel® IXP2800 Network Processor Intel XScale® Core Intel XScale® Core 3 This section contains information describing the Intel XScale® core, Intel XScale® core gasket, and Intel XScale® core Peripherals (XPI). For additional information about the Intel XScale® architecture refer to the Intel XScale® Core Developers Manual available on Intel’s Developers web site (http://www.developer.intel.com). 3.1 Introduction The Intel XScale® core is an ARM* V5TE compliant microprocessor.
PAGE 80
Intel® IXP2800 Network Processor Intel XScale® Core 3.2 Features Figure 16 shows the major functional blocks of the Intel XScale® core. Figure 16.
PAGE 81
Intel® IXP2800 Network Processor Intel XScale® Core 3.2.3 Instruction Cache The Intel XScale® core implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided. 3.2.4 Branch Target Buffer (BTB) The Intel XScale® core provides a Branch Target Buffer to predict the outcome of branch type instructions.
PAGE 82
Intel® IXP2800 Network Processor Intel XScale® Core 3.3 Memory Management The Intel XScale® core implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. To accelerate virtual to physical address translation, the Intel XScale® core uses both an instruction Translation Look-aside Buffer (TLB) and a data TLB to cache the latest translations. Each TLB holds 32 entries and is fully-associative.
PAGE 83
Intel® IXP2800 Network Processor Intel XScale® Core 3.3.1.2.2 Instruction Cache When examining these bits in a descriptor, the Instruction Cache only utilizes the C bit. If the C bit is clear, the Instruction Cache considers a code fetch from that memory to be non-cacheable, and will not fill a cache entry. If the C bit is set, then fetches from the associated memory region will be cached. 3.3.1.2.
PAGE 84
Intel® IXP2800 Network Processor Intel XScale® Core If the Line Allocation Policy is read-allocate, all load operations that miss the cache request a 32-byte cache line from external memory and allocate it into either the data cache or mini-data cache (this is assuming the cache is enabled). Store operations that miss the cache will not cause a line to be allocated.
PAGE 85
Intel® IXP2800 Network Processor Intel XScale® Core 3.3.3 Interaction of the MMU, Instruction Cache, and Data Cache The MMU, instruction cache, and data/mini-data cache may be enabled/disabled independently. The instruction cache can be enabled with the MMU enabled or disabled. However, the data cache can only be enabled when the MMU is enabled. Therefore only three of the four combinations of the MMU and data/mini-data cache enables are valid (see Table 23).
PAGE 86
Intel® IXP2800 Network Processor Intel XScale® Core 3.3.4.3 Locking Entries Individual entries can be locked into the instruction and data TLBs. If a lock operation finds the virtual address translation already resident in the TLB, the results are unpredictable. An invalidate by entry command before the lock command will ensure proper operation. Software can also accomplish this by invalidating all entries, as shown in Example 15.
PAGE 87
Intel® IXP2800 Network Processor Intel XScale® Core The proper procedure for locking entries into the data TLB is shown in Example 16. Example 16.
PAGE 88
Intel® IXP2800 Network Processor Intel XScale® Core Figure 17 illustrates locked entries in TLB. entry 0 entry 1 Locked Figure 17. Example of Locked Entries in TLB entry 7 entry 8 entry 22 entry 23 entry 30 entry 31 Note: 8 entries locked, 24 entries available for round robin replacement A9684-01 3.4 Instruction Cache The Intel XScale® core instruction cache enhances performance by reducing the number of instruction fetches from external memory. The cache provides fast execution of cached code.
PAGE 89
Intel® IXP2800 Network Processor Intel XScale® Core Figure 18.
PAGE 90
Intel® IXP2800 Network Processor Intel XScale® Core 3.4.1.2 Operation when Instruction Cache is Disabled Disabling the cache prevents any lines from being written into the instruction cache. Although the cache is disabled, it is still accessed and may generate a “hit” if the data is already in the cache. Disabling the instruction cache does not disable instruction buffering that may occur within the instruction fetch buffers.
PAGE 91
Intel® IXP2800 Network Processor Intel XScale® Core 3.4.1.5 Parity Protection The instruction cache is protected by parity to ensure data integrity. Each instruction cache word has 1 parity bit. (The instruction cache tag is not parity protected.) When a parity error is detected on an instruction cache access, a prefetch abort exception occurs if the Intel XScale® core attempts to execute the instruction.
PAGE 92
Intel® IXP2800 Network Processor Intel XScale® Core 3.4.2 Instruction Cache Control 3.4.2.1 Instruction Cache State at Reset After reset, the instruction cache is always disabled, unlocked, and invalidated (flushed). 3.4.2.2 Enabling/Disabling The instruction cache is enabled by setting bit 12 in coprocessor 15, register 1 (Control register). This process is illustrated in Example 18. Example 18.
PAGE 93
Intel® IXP2800 Network Processor Intel XScale® Core There are several requirements for locking down code: 1. The routine used to lock lines down in the cache must be placed in non-cacheable memory, which means the MMU is enabled. As a corollary: no fetches of cacheable code should occur while locking instructions into the cache. 2. The code being locked into the cache must be cacheable. 3. The instruction cache must be enabled and invalidated prior to locking down lines.
PAGE 94
Intel® IXP2800 Network Processor Intel XScale® Core Example 20 shows how a routine, called “lockMe” in this example, might be locked into the instruction cache. Note that it is possible to receive an exception while locking code. Example 20. Locking Code into the Cache lockMe: ; This is the code that will be locked into the cache mov r0, #5 add r5, r1, r2 . . . lockMeEnd: . . .
PAGE 95
Intel® IXP2800 Network Processor Intel XScale® Core Figure 20. BTB Entry TAG DATA Branch Address[31:9,1] History Bits[1:0] Target Address[31:1] A9687-01 The BTB takes the current instruction address and checks to see if this address is a branch that was previously seen. It uses bits [8:2] of the current address to read out the tag and then compares this tag to bits [31:9,1] of the current instruction address.
PAGE 96
Intel® IXP2800 Network Processor Intel XScale® Core 3.5.2 Update Policy A new entry is stored into the BTB when the following conditions are met: • The branch instruction has executed • The branch was taken • The branch is not currently in the BTB The entry is then marked valid and the history bits are set to WT. If another valid branch exists at the same entry in the BTB, it will be evicted by the new branch.
PAGE 97
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.1 Overviews 3.6.1.1 Data Cache Overview The data cache is a 32-Kbyte, 32-way set associative cache, i.e., there are 32 sets and each set has 32 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist two dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache, the dirty bit associated with it is set.
PAGE 98
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.1.2 Mini-Data Cache Overview The mini-data cache is a 2-Kbyte, 2-way set associative cache; this means there are 32 sets with each set containing 2 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist 2 dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache, the dirty bit associated with it is set.
PAGE 99
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.1.3 Write Buffer and Fill Buffer Overview The Intel XScale® core employs an eight entry write buffer, each entry containing 16 bytes. Stores to external memory are first placed in the write buffer and subsequently taken out when the bus is available. The write buffer supports the coalescing of multiple store requests to external memory. An incoming store may coalesce with any of the eight entries.
PAGE 100
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.2.3 3.6.2.3.1 Cache Policies Cacheability Data at a specified address is cacheable given the following: • The MMU is enabled • The cacheable attribute is set in the descriptor for the accessed address • The data/mini-data cache is enabled 3.6.2.3.2 Read Miss Policy The following sequence of events occurs when a cacheable load operation misses the cache: 1.
PAGE 101
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.2.3.3 Write Miss Policy A write operation that misses the cache, requests a 32-byte cache line from external memory if the access is cacheable and write allocation is specified in the page; then, the following events occur: 1. The fill buffer is checked to see if an outstanding fill request already exists for that line.
PAGE 102
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.2.4 Round-Robin Replacement Algorithm The line replacement algorithm for the data cache is round-robin. Each set in the data cache has a round-robin pointer that keeps track of the next line (in that set) to replace. The next line to replace in a set is the next sequential line after the last one that was just filled. For example, if the line for the last fill was written into way 5-set 2, the next line to replace for that set would be way 6.
PAGE 103
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.3 Data Cache and Mini-Data Cache Control 3.6.3.1 Data Memory State After Reset After processor reset, both the data cache and mini-data cache are disabled, all valid bits are set to 0 (invalid), and the round-robin bit points to way 31. Any lines in the data cache that were configured as data RAM before reset are changed back to cacheable lines after reset, i.e., there are 32 KBytes of data cache and 0 bytes of data RAM. 3.6.3.
PAGE 104
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.3.3.1 Global Clean and Invalidate Operation A simple software routine is used to globally clean the data cache. It takes advantage of the lineallocate data cache operation, which allocates a line into the data cache. This allocation evicts any cache dirty data back to external memory. Example 22 shows how data cache can be cleaned. Example 22.
PAGE 105
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.4 Reconfiguring the Data Cache as Data RAM Software has the ability to lock tags associated with 32-byte lines in the data cache, thus creating the appearance of data RAM. Any subsequent access to this line will always hit the cache unless it is invalidated. Once a line is locked into the data cache it is no longer available for cache allocation on a line fill.
PAGE 106
Intel® IXP2800 Network Processor Intel XScale® Core 3.6.5 Write Buffer/Fill Buffer Operation and Control The write buffer is always enabled, which means stores to external memory will be buffered. The K bit in the Auxiliary Control register (CP15, register 1) is a global enable/disable for allowing coalescing in the write buffer. When this bit disables coalescing, no coalescing will occur regardless the value of the page attributes.
PAGE 107
Intel® IXP2800 Network Processor Intel XScale® Core 3.8 Performance Monitoring The Intel XScale® core hardware provides two 32-bit performance counters that allow two unique events to be monitored simultaneously. In addition, the Intel XScale® core implements a 32-bit clock counter that can be used in conjunction with the performance counters; its sole purpose is to count the number of core clock cycles, which is useful in measuring total execution time.
PAGE 108
Intel® IXP2800 Network Processor Intel XScale® Core Table 24. Performance Monitoring Events (Sheet 2 of 2) Event Number (evtCount0 or evtCount1) Event Definition 0x7 Instruction executed. 0x8 Stall because the data cache buffers are full. This event will occur every cycle in which the condition is present. 0x9 Stall because the data cache buffers are full. This event will occur once for each contiguous sequence of this type of stall.
PAGE 109
Intel® IXP2800 Network Processor Intel XScale® Core 3.8.1.2 Data Cache Efficiency Mode PMN0 totals the number of data cache accesses, which includes cacheable and non-cacheable accesses, mini-data cache access and accesses made to locations configured as data RAM. Note that STM and LDM will each count as several accesses to the data cache depending on the number of registers specified in the register list. LDRD will register two accesses. PMN1 counts the number of data cache and mini-data cache misses.
PAGE 110
Intel® IXP2800 Network Processor Intel XScale® Core Statistics derived from these two events: • The average number of cycles the processor stalled on a data-cache access that may overflow the data-cache buffers. This is calculated by dividing PMN0 by PMN1. This statistic lets you know if the duration event cycles are due to many requests or are attributed to just a few requests. If the average is high, the Intel XScale® core may be starved of the bus external to the Intel XScale® core.
PAGE 111
Intel® IXP2800 Network Processor Intel XScale® Core 3.8.1.6 Instruction TLB Efficiency Mode PMN0 totals the number of instructions that were executed, which does not include instructions that were translated by the instruction TLB and never executed. This can happen if a branch instruction changes the program flow; the instruction TLB may translate the next sequential instructions after the branch, before it receives the target address of the branch.
PAGE 112
Intel® IXP2800 Network Processor Intel XScale® Core 3.9.1 Interrupt Latency Minimum Interrupt Latency is defined as the minimum number of cycles from the assertion of any interrupt signal (IRQ or FIQ) to the execution of the instruction at the vector for that interrupt. The point at which the assertion begins is TBD. This number assumes best case conditions exist when the interrupt is asserted, e.g., the system isn’t waiting on the completion of some other operation.
PAGE 113
Intel® IXP2800 Network Processor Intel XScale® Core 3.9.3 Addressing Modes All load and store addressing modes implemented in the Intel XScale® core do not add to the instruction latencies numbers. 3.9.4 Instruction Latencies The latencies for all the instructions are shown in the following sections with respect to their functional groups: branch, data processing, multiply, status register access, load/store, semaphore, and coprocessor. The following section explains how to read these tables. 3.9.4.
PAGE 114
Intel® IXP2800 Network Processor Intel XScale® Core Minimum Issue Latency (without Branch Misprediction) to the minimum branch latency penalty number from Table 26, which is four cycles. • Minimum Resource Latency The minimum cycle distance from the issue clock of the current multiply instruction to the issue clock of the next multiply instruction assuming the second multiply does not incur a data dependency and is immediately available from the instruction cache or memory interface.
PAGE 115
Intel® IXP2800 Network Processor Intel XScale® Core 3.9.4.2 Branch Instruction Timings Table 28. Branch Instruction Timings (Predicted by the BTB) Mnemonic Minimum Issue Latency when Correctly Predicted by the BTB Minimum Issue Latency with Branch Misprediction B 1 5 BL 1 5 ( Table 29. Branch Instruction Timings (Not Predicted by the BTB) 1. 3.9.4.
PAGE 116
Intel® IXP2800 Network Processor Intel XScale® Core 3.9.4.4 Multiply Instruction Timings Table 31.
PAGE 117
Intel® IXP2800 Network Processor Intel XScale® Core Table 31. Multiply Instruction Timings (Sheet 2 of 2) Rs Value (Early Termination) Mnemonic Minimum Issue Latency Minimum Result Latency1 0 1 RdLo = 2; RdHi = 3 2 1 3 3 3 0 1 RdLo = 3; RdHi = 4 3 1 4 4 4 0 1 RdLo = 4; RdHi = 5 4 1 5 5 5 Rs[31:15] = 0x00000 UMULL Rs[31:27] = 0x00 all others 1.
PAGE 118
Intel® IXP2800 Network Processor Intel XScale® Core 3.9.4.6 Status Register Access Instructions Table 35. Status Register Access Instruction Timings Mnemonic 3.9.4.7 Minimum Issue Latency Minimum Result Latency MRS 1 2 MSR 2 (6 if updating mode bits) 1 Load/Store Instructions Table 36.
PAGE 119
Intel® IXP2800 Network Processor Intel XScale® Core 3.9.4.9 Coprocessor Instructions Table 39. CP15 Register Access Instruction Timings Mnemonic Minimum Issue Latency Minimum Result Latency MRC 4 4 MCR 2 N/A Table 40. CP14 Register Access Instruction Timings Mnemonic 3.9.4.10 Minimum Issue Latency Minimum Result Latency MRC 7 7 MCR 7 N/A LDC 10 N/A STC 7 N/A Miscellaneous Instruction Timing Table 41.
PAGE 120
Intel® IXP2800 Network Processor Intel XScale® Core 3.10.1 IXP2800 Network Processor Endianness Endianness defines the way bytes are addressed within a word. A little-endian system is one in which byte 0 is the least significant byte (LSB) in the word and byte 3 is the most significant byte (MSB). A big-endian system is one in which byte 0 is the MSB and byte 3 is the LSB. For example, the value of 0x12345678 at address 0x0 in a 32-bit little-endian system looks like this: Table 43.
PAGE 121
Intel® IXP2800 Network Processor Intel XScale® Core 3.10.1.1 Read and Write Transactions Initiated by the Intel XScale® Core The Intel XScale® core may be used in either a little-endian or big-endian configuration. The configuration affects the entire system in which the Intel XScale® core microarchitecture exists. Software and hardware must agree on the byte ordering to be used. In software, a system’s byte order is configured with CP15 register 1, the control register.
PAGE 122
Intel® IXP2800 Network Processor Intel XScale® Core 16-Bit (Word) Read When reading a word, the Intel XScale® core generates the byte_enable that corresponds to the proper byte lane as defined by the endianness setting. Figure 25 summarizes byte enable generation for this mode. The 4-to-1 multiplexer steers byte lane 0 or byte lane 2 into the byte 0 location of the read register inside the Intel XScale® core.
PAGE 123
Intel® IXP2800 Network Processor Intel XScale® Core Table 46. Byte-Enable Generation by the Intel XScale® Core for 16-Bit Data Transfers in Littleand Big-Endian Systems Word to be Read Byte-Enables for Little-Endian System Byte-Enables for Big-Endian System X_BE[0] X_BE[1] X_BE[2] X_BE[3] X_BE[0] X_BE[1] X_BE[2] X_BE[3] Byte 0, Byte 1 1 1 0 0 0 0 1 1 Byte 2, Byte 3 0 0 1 1 1 1 0 0 32-Bit (Longword) Read 32-bit (longword) reads are independent of endianness.
PAGE 124
Intel® IXP2800 Network Processor Intel XScale® Core Word Write (16-Bits Write) When the Intel XScale® core writes a 16-bit word to external memory, it puts the bytes in the byte lanes where it intends to write them along with the byte enables for those bytes turned ON based on the endian setting of the system. The Intel XScale® core does not allow a word write on an odd-byte address. The Intel XScale® core register bits [15:0] always contain the word to be written regardless of the B-bit setting.
PAGE 125
Intel® IXP2800 Network Processor Intel XScale® Core Figure 26.
PAGE 126
Intel® IXP2800 Network Processor Intel XScale® Core The Intel XScale® core coprocessor bus is not used in the IXP2800 Network Processors, therefore all accesses are only through the Command Memory Bus. Figure 27 shows the block diagram of the global bus connections to the gasket. The gasket unit has the following features: • Interrupts are sent to the Intel XScale® core via the gasket, with the interrupt controller registers used for masking the interrupts.
PAGE 127
Intel® IXP2800 Network Processor Intel XScale® Core 3.11.2 Intel XScale® Core Gasket Functional Description 3.11.2.1 Command Memory Bus to Command Push/Pull Conversion The primary function of the Intel XScale® core gasket unit is to translate commands initiated from the Intel XScale® core in the Intel XScale® core command bus format, into the IXP2800 internal command format (Command Push/Pull format). Table 49 shows how many CPP commands are generated by the gasket from each CMB command.
PAGE 128
Intel® IXP2800 Network Processor Intel XScale® Core 3.11.4 Atomic Operations The Intel XScale® core has Swap (SWP) and Swap Byte (SWPB) instructions that generate an atomic read-write pair to a single address. These instructions are supported for the SRAM and Scratch space, and also to any other address space if it is done by a Read command followed by Write command.
PAGE 129
Intel® IXP2800 Network Processor Intel XScale® Core 3.11.4.1 Summary of Rules for the Atomic Command Regarding I/O The following rules summarize the Atomic command, regarding I/O. • SWP to SRAM/Scratch and Not cbiIO, Xscale_IF generates an Atomic operation command. • SWP to all other Addresses that are not SRAM/Scratch, will be treated as separate read and write commands. No Atomic command is generated. • SWP to SRAM/Scratch and cbiIO, will be treated as separate read and write commands.
PAGE 130
Intel® IXP2800 Network Processor Intel XScale® Core 3.11.5 I/O Transaction The Intel XScale® core can request an I/O transaction by asserting xsoCBI_IO concurrently with xsoCBI_Req. The value of xsoCBI_IO is undefined when xsoCBI_Req is not asserted. When the gasket sees an I/O request with xsoCBI_IO asserted, it will raise xsiCBR_Ack but will not acknowledge future requests until the IO transaction is complete. The gasket will check if all of the command FIFOs and write data FIFOs are empty or not.
PAGE 131
Intel® IXP2800 Network Processor Intel XScale® Core 3.11.7 Gasket Local CSR There are two sets of Control and Status registers residing in the gasket Local CSR space. ICSR refers to the Interrupt CSR. The ICSR address range is 0xd600_0000 – 0xd6ff_ffff. The Gasket CSR (GCSR) refers to the Hash CSRs and debug CSR. It has a range of 0xd700_0000 – 0xd7ff_ffff. GCSR is shown in Table 51. Note: The Gasket registers are defined in the IXP2400 and IXP2800 Network Processor Programmer’s Reference Manual.
PAGE 132
Intel® IXP2800 Network Processor Intel XScale® Core 3.11.8 Interrupt The Intel XScale® core CSR controller contains local CSR(s) and interrupts inputs from multiple sources. The diagram in Figure 28 shows the flow through the controller. Within the Interrupt/CSR register block there are raw status registers, enable registers, and local CSR(s). The raw status registers are the un-masked interrupt status.
PAGE 133
Intel® IXP2800 Network Processor Intel XScale® Core Figure 29.
PAGE 134
Intel® IXP2800 Network Processor Intel XScale® Core 3.12 Intel XScale® Core Peripheral Interface This section describes the Intel XScale® core Peripheral Interface unit (XPI). The XPI is the block that connects to all the slow and serial interfaces that communicate with the Intel XScale® core through the APB. These can also be accessed by the Microengines and PCI unit.
PAGE 135
Intel® IXP2800 Network Processor Intel XScale® Core Figure 30. XPI Interfaces for IXP2800 Network Processor Intel® IXP2400/2800 Network Processor XPI [7:0]/[15:0]/[31:0] UART Intel XScale® Core Reset Sequential Logic [31:0] GPIO APB Bus CPP Bus PCI SlowPort rx,tx SONET/SDH Microprocessor Interface [7:0] Demultiplexor SHaC [3:0] [7:0] PROM Timer watchdog_reset B1740-02 3.12.1.1 Data Transfers The current rate for data transfers is four bytes, except for the Slowport.
PAGE 136
Intel® IXP2800 Network Processor Intel XScale® Core Table 52. Data Transaction Alignment Interface Units APB Bus Read Write GRegs 32 bits 32 bits 32 bits UART 32 bits 32 bits 32 bits GPIO 32 bits 32 bits 32 bits Timer 32 bits 32 bits 32 bits Slowport Microprocessor Access Slowport 1 Flash Memory Access CSR Access 1. 3.12.1.
PAGE 137
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.2 UART Overview The UART performs serial-to-parallel conversion on data characters received from a peripheral device and parallel-to-serial conversion on data characters received from the network processor. The processor can read the complete status of the UART at any time during the functional operation.
PAGE 138
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.3 UART Operation The format of a UART data frame is shown in Figure 31. Figure 31. UART Data Frame Start Bit Data <0> Data <1> Data <2> Data <3> Data <4> Data <5> Data <6> Data <7> Parity Bit Stop Bit 1 Stop Bit 2 TXD or RXD pin LSB MSB Notes: Receive data sample counter frequency = 16x bit frequency, each bit is sampled three times in the middle. Shaded bits are optional and can be proammed by users.
PAGE 139
Intel® IXP2800 Network Processor Intel XScale® Core Character Time-out Interrupt When the receiver FIFO and receiver time-out interrupt are enabled, a character time-out interrupt occurs when all of the following conditions exist: • At least one character is in the FIFO. • The last received character was longer than four continuous character times ago (if two stop bits are programmed the second one is included in this time delay).
PAGE 140
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.5 General Purpose I/O (GPIO) The IXP2800 Network Processor has eight General Purpose Input/Output (GPIO) port pins for use in generating and capturing application-specific input and output signals. Each pin is programmable as an input or output or as an interrupt signal sourcing from an external device. The GPIO can be used with appropriate software in I2C application.
PAGE 141
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.6 Timers The IXP2800 Network Processor supports four timers. These timers are clocked by the Advanced Peripheral/Bus Clock (APB-CLK), which runs at 50 MHz to produce the PLPL_APB_CLK, PLPL_APB_CLK/16, or PLPL_APB_CLK/256 signals. The counters are loaded with an initial value, count down to 0, and raise an interrupt (if interrupts are not masked). In addition, timer 4 can be used as a watchdog timer when the watchdog enable bits are configured to 1.
PAGE 142
Intel® IXP2800 Network Processor Intel XScale® Core Figure 34 shows the Timer Internal logic. Figure 34. Timer Internal Logic Diagram Timer Registers Block TCTL TCLD WRITE_DATA Timer Control Logic TCLR TWDE TCSR READ_DATA APB_SEL APB_WR ADDRESS Decoder & Control Logic Watchdog Watchdog Logic Reset ENABLE CLK Divided by 16 Counter Logic Interrupts Divided by 16 GP_TM[3:0] A9703-01 3.12.
PAGE 143
Intel® IXP2800 Network Processor Intel XScale® Core The Flash memory interface is used for the PROM device. The microprocessor interface can be used for SONET/SDH Framer microprocessor access. There are two ports in the Slowport unit. The first is dedicated to the flash memory device while the second to the microprocessor device. 3.12.7.1 PROM Device Support For all the Flash Memory access, only 8-bit devices are supported.
PAGE 144
Intel® IXP2800 Network Processor Intel XScale® Core Table 55.
PAGE 145
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.7.4 Address Space The total address space is defined as 64 Mbytes, which is further divided into two segments of 32 Mbytes each. Two devices can be connect to this bus. If these peripheral devices have a density of 256 Mbits (32 Mbytes) each, all the address space is going to be filled like a contiguous address space.
PAGE 146
Intel® IXP2800 Network Processor Intel XScale® Core Figure 37. Slowport Example Application Topology SP_RD_L OE_L SP_WR_L WE_L SP_CS_L[0] CS_L SP_CS_L[1] SP_A[1:0] A[1:0] SP_AD[7:0] D[7:0] A[24:2] SP_ALE_L SP_CLK Intel® IXP2800 Network Processor Clock Driver CY2305 CE# D[7:0] CP Q[7:0] CE# D[7:0] CP Q[7:0] 74f377 A[24:18] A[24:2] 74f377 OE_L WE_L A[17:10] CE# D[7:0] CP Q[7:0] 74f377 CS_L A[1:0] D[7:0] A[9:2] SP_ACK_L ACK_L A9318-02 3.12.7.
PAGE 147
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.7.6.1 Mode 0 Single Write Transfer for Fixed-Timed Device Figure 38, shows the single write transfer for a fixed-timed device with the CSR programmed to a value of setup=4, pulse width=0, and hold=2, followed by another read transfer. Figure 38.
PAGE 148
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.7.6.2 Mode 0 Single Write Transfer for Self-Timing Device Figure 39 depicts the single write transfer for a self-timing device with the CSR programmed to setup=4, pulse width=0, and hold=3. Similarly, a read transaction is attached behind. Figure 39.
PAGE 149
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.7.6.3 Mode 0 Single Read Transfer for Fixed-Timed Device Figure 40 demonstrates the single read transfer issued to a fixed-timed PROM device followed by another write transaction. The CSR is assumed to be configured to the value setup=2, pulse width=10, and hold=1. Figure 40.
PAGE 150
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.7.6.4 Single Read Transfer for a Self-Timing Device Figure 41 demonstrates the single read transfer issued to a self-timing PROM device followed by another write transaction. The CSR assumed to be programmed to the value of setup=4, pulse width=0, and hold=2. Figure 41.
PAGE 151
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.7.7.1 Mode 1: 16-Bit Microprocessor Interface Support with 16-Bit Address Lines The address size control register is programmed to 16-bit address space for this case. This mode is designated for the devices with the similar protocol with the Lucent* TDAT042G5 SONET/SDH device. 16-Bit Microprocessor Interfacing Topology with 16-Bit address lines Figure 42 shows a solution for the 16-bit microprocessor interface.
PAGE 152
Intel® IXP2800 Network Processor Intel XScale® Core Figure 42.
PAGE 153
Intel® IXP2800 Network Processor Intel XScale® Core 16-Bit Microprocessor Write Interface Protocol Figure 43 uses the Lucent* TDAT042G5 device. In this case, the user should program the P_PCR register to mode 1 and also program the write timing control register to setup=7, pulse width=5, and hold=1, which represent seven clock cycles for CS, five clock cycles for DT delay, and one clock cycle for ADS. They are intervened with two idle cycles.
PAGE 154
Intel® IXP2800 Network Processor Intel XScale® Core 16-Bit Microprocessor Read Interface Protocol Figure 44, likewise depicts a single read transaction launched from the IXP2800 Network Processor to the Lucent* TDAT042G5 device followed by a single read transaction. However, in this case the read timing control register has to be programmed to setup=0, pulse width=7, and hold =0. In Figure 44, we can count twelve clock cycles used for the read transaction in total, (i.e.
PAGE 155
Intel® IXP2800 Network Processor Intel XScale® Core 3.12.7.7.2 Mode 2: Interface with 8 Data Bits and 11 Address Bits This application is designed for the PMC-Sierra* PM5351 S/UNI-TETRA* device. For the PMC-Sierra* PM5351, the address space is programmed to 11 bits; otherwise, other address space should be specified. 8-Bit PMC-Sierra* PM5351 S/UNI-TETRA* Interfacing Topology Figure 45 displays one of the topologies used to connect to the Slowport with the PMC-Sierra* PM5351 S/UNI-TETRA* device.
PAGE 156
Intel® IXP2800 Network Processor Intel XScale® Core PMC-Sierra* PM5351 S/UNI-TETRA* Write Interface Protocol Figure 46 depicts a single write transaction launched from the IXP2800 to the PMC-Sierra* PM5351 device followed by single read transaction. The write transaction for the PMC-Sierra* component has six clock cycles or a 120-ns access time for a 50-MHz Slowport clock. In this case, no intervening cycle is added after the transaction. The I/O throughput is 8.3 Mbytes per second.
PAGE 157
Intel® IXP2800 Network Processor Intel XScale® Core Figure 47, depicts a single read transaction launched from the IXP2800 Network Processor to the PMC-Sierra* PM5351 device, followed by a single write transaction. In this case, there are ten clock cycles of access time, or 200 ns in total, with three turnaround cycles attached at the back. The I/O throughput is 11.2 Mbytes per second. Figure 47.
PAGE 158
Intel® IXP2800 Network Processor Intel XScale® Core For a write, SP_CP loads the data onto the 74F646 (or equivalent) tri-state buffers, using two clock cycles. To reduce the pin count, the 16-bit data is latched with the same pin (SP_CS_L[1]), assuming that a turnaround cycle is inserted between the transaction cycles. For a read, data are shifted out of two 74F646 or equivalent tri-state buffers by SP_CP, using two consecutive clock cycles. Figure 48.
PAGE 159
Intel® IXP2800 Network Processor Intel XScale® Core Figure 49.
PAGE 160
Intel® IXP2800 Network Processor Intel XScale® Core Mode 3 Write Interface Protocol Figure 50 depicts a single write transaction launched from the IXP2800 Network Processor to the Intel and AMCC* SONET/SDH device, followed by two consecutive reads. Compared with the Lucent* TDAT042G5, this device has a shorter access time, about eight clock cycles (i.e., 160 ns). In this case, an intervening cycle may not be needed for the write transactions. Therefore, the throughput is about 12.5 Mbytes per second.
PAGE 161
Intel® IXP2800 Network Processor Intel XScale® Core Mode 3 Read Interface Protocol Figure 51 depicts a single read transaction launched from the IXP2800 to the Intel and AMCC* SONET/SDH device, followed by two consecutive writes. Similarly, the access time is much better than the Lucent* TDAT042G5. The access time is eight clock cycles or 160 ns for a 50-MHz Slowport clock. Here, there are three intervening cycles between transactions. Therefore, the throughput is 11.1 Mbytes per second. Figure 51.
PAGE 162
Intel® IXP2800 Network Processor Intel XScale® Core It employs the same way to pack and unpack the data between the IXP2800 Network Processor Slowport interface and the Intel and AMCC* microprocessor interface. For a write, W2B loads the data onto the 74F646 or equivalent tri-state buffers, using two clock cycles. To reduce the pin count, the 16-bit data are latched with the same pin (CS_L[1]), assuming that a turnaround cycle is inserted between the transaction cycles.
PAGE 163
Intel® IXP2800 Network Processor Intel XScale® Core Figure 53.
PAGE 164
Intel® IXP2800 Network Processor Intel XScale® Core Mode 4 Write Interface Protocol Figure 54 depicts a single write transaction launched from the IXP2800 Network Processor to the Intel and AMCC* SONET/SDH device, followed by two consecutive reads. Compared with the Lucent* TDAT042G5, this device has a shorter access time, about eight clock cycles (i.e., 120 ns). In this case, an intervened cycle may not be needed; therefore, the throughput is about 12.5 Mbytes per second. Figure 54.
PAGE 165
Intel® IXP2800 Network Processor Intel XScale® Core Mode 4 Read Interface Protocol Figure 55 shows a single read transaction launched from the IXP2800 Network Processor to the Intel and AMCC* SONET/SDH device, followed by two consecutive writes. Similarly, the access time is much better than the Lucent* TDAT042G5. The access time is about eight clock cycles or 160 ns. Here, we need an intervened cycle at the back. Therefore, the throughput is 11.2 Mbytes per second. Figure 55.
PAGE 166
Intel® IXP2800 Network Processor Intel XScale® Core 166 Hardware Reference Manual
PAGE 167
Intel® IXP2800 Network Processor Microengines Microengines 4 This section defines the Network Processor Microengine (ME). This is the second version of the Microengine, and is often referred to as the MEv2 (Microengine Version 2). 4.1 Overview The following sections describe the programmer’s view of the Microengine. The block diagram in Figure 56 is used in the description.
PAGE 168
Intel® IXP2800 Network Processor Microengines Figure 56.
PAGE 169
Intel® IXP2800 Network Processor Microengines 4.1.1 Control Store The Control Store is a static RAM that holds the program that the Microengine executes. It holds 8192 instructions, each of which is 40 bits wide. It is initialized by an external device that writes to Ustore_Addr and Ustore_Data Local CSRs. The Control Store can optionally be protected by parity against soft errors.
PAGE 170
Intel® IXP2800 Network Processor Microengines Figure 57. Context State Transition Diagram CTX_ENABLE bit is set by Intel XScale® Core Reset Inactive CTX_ENABLE bit is cleared Sleep Ready CTX_ENABLE bit is cleared Ex te lE rn a ve ig nt S n al es arriv Context executes CTX Arbitration instruction Executing Context goes to Sleep state, and this Context is the highest round-robin priority.
PAGE 171
Intel® IXP2800 Network Processor Microengines 4.1.3 Datapath Registers As shown in the block diagram in Figure 56, each Microengine contains four types of 32-bit datapath registers: • • • • 4.1.3.1 256 General Purpose registers 512 Transfer registers 128 Next Neighbor registers 640 32-bit words of Local Memory General-Purpose Registers (GPRs) GPRs are used for general programming purposes. They are read and written exclusively under program control.
PAGE 172
Intel® IXP2800 Network Processor Microengines Typically, the external units access the Transfer registers in response to commands sent by the Microengines. The commands are sent in response to instructions executed by the Microengine (for example, the command instructs a SRAM controller to read from external SRAM, and place the data into a S_TRANSFER_IN register).
PAGE 173
Intel® IXP2800 Network Processor Microengines It is also possible to make use of both or one LM_Addrs as global by setting CTX_Enable[LM_Addr_0_Global] and/or CTX_Enable[LM_Addr_1_Global]. When used globally, all Contexts use the working copy of LM_Addr in place of their own Context specific one; the Context specific ones are unused. 4.1.4 Addressing Modes GPRs can be accessed in two different addressing modes: Context-Relative and Absolute.
PAGE 174
Intel® IXP2800 Network Processor Microengines 4.1.4.2 Absolute Addressing Mode With Absolute addressing, any GPR can be read or written by any one of the eight Contexts in a Microengine. Absolute addressing enables register data to be shared among all of the Contexts, e.g., for global variables or for parameter passing. All 256 GPRs can be read by Absolute address. 4.1.4.
PAGE 175
Intel® IXP2800 Network Processor Microengines Figure 58. Byte Align Block Diagram Prev_B Prev_A . . . . . . A_Operand B_Operand Shift Byte_Index Result A9353-01 Example 24 shows an align sequence of instructions and the value of the various operands. Table 59 shows the data in the registers for this example. The value in Byte_Index[1:0] CSR (which controls the shift amount) for this example is 2. Table 59.
PAGE 176
Intel® IXP2800 Network Processor Microengines Example 25 shows another sequence of instructions and the value of the various operands. Table 60, shows the data in the registers for this example. The value in Byte_Index[1:0] CSR (which controls the shift amount) for this example is 2. Table 60. Register Contents for Example 24 Register Byte 3 [31:24] Byte 2 [23:16] Byte 1 [15:8] Byte 0 [7:0] 0 3 2 1 0 1 7 6 5 4 2 B A 9 8 3 F E D C Example 25.
PAGE 177
Intel® IXP2800 Network Processor Microengines Note: The State bits are data associated with the entry. State bits are only used by software. There is no implication of ownership of the entry by any Context. The State bits hardware function is: • the value is set by software (when the entry is loaded or changed in an already-loaded entry). • its value is read out on a lookup that hits, and used as part of the status written into the destination register.
PAGE 178
Intel® IXP2800 Network Processor Microengines One possible way to use the result of a lookup is to dispatch to the proper code using instruction: jump[register, label#],defer [3] where the register holds the result of the lookup. The State bits can be used to differentiate cases where the data associated with the CAM entry is in flight, or is pending a change, etc. Because the lookup result was loaded into bits[11:3] of the destination register, the jump destinations are spaced eight instructions apart.
PAGE 179
Intel® IXP2800 Network Processor Microengines Example 26. Algorithm for Debug Software to Determine the Contents of the CAM ; First read each of the tag entries. Note that these reads ; don’t modify the LRU list or any other CAM state. tag[0] = CAM_Read_Tag(entry_0); ...... tag[15] = CAM_Read_Tag(entry_15); ; Now read each of the state bits state[0] = CAM_Read_State(entry_0); ...
PAGE 180
Intel® IXP2800 Network Processor Microengines 4.5 Event Signals Event Signals are used to coordinate a program with completion of external events. For example, when a Microengine issues a command to an external unit to read data (which will be written into a Transfer_In register), the program must insure that it does not try to use the data until the external unit has written it. There is no hardware mechanism to flag that a register write is pending, and then prevent the program from using it.
PAGE 181
Intel® IXP2800 Network Processor Microengines 4.5.1 Microengine Endianness Microengine operation from an “endian” point of view can be divided into following categories: • • • • • • 4.5.1.1 Read from RBUF (64 bits) Write to TBUF (64 bits) Read/write from/to SRAM Read/write from/to DRAM Read/write from/to SHaC and other CSRs Write to Hash Read from RBUF (64 Bits) Data in RBUF is arranged in LWBE order.
PAGE 182
Intel® IXP2800 Network Processor Microengines 4.5.1.2 Write to TBUF Data in TBUF is arranged in LWBE order. When writing from the Microengine transfer registers to TBUF, treg0 goes into LDW0, treg1 goes into LDW1, etc. See Figure 61. Figure 61. Write to TBUF (64 Bits) TBUF 0123 8 9 10 11 MicroEngine 4567 12 13 14 15 0123 4567 8 9 10 11 12 13 14 15 treg0 treg1 treg2 treg3 A8942-01 4.5.1.3 Read/Write from/to SRAM Data inside SRAM is in big-endian order.
PAGE 183
Intel® IXP2800 Network Processor Microengines 4.5.1.6 Write to Hash Unit Figure 62 explains 48-, 64-, and 128-bit hash operations. When the Microengine transfers a 48-bit hash operand to the hash unit, the operand resides in two transfer registers and is transferred, as shown in Figure 62. In the second longword transfer, only the lower half is valid. Hash unit concatenates the two longwords as shown in Figure 62.
PAGE 184
Intel® IXP2800 Network Processor Microengines 4.5.2.1 Read from RBUF To analyze the endianness on the media-receive interface and the way in which bytes are arranged inside RBUF, a brief introduction of how bytes are generated from the serial interface is provided here. Pipe A denotes the serial stream of data received at the serial interface (SERDES). Bit 0 of byte 0 comes first, followed by bit 1, etc. Pipe B converts this bit stream into byte stream byte 0 — byte 7, etc.
PAGE 185
Intel® IXP2800 Network Processor Microengines 4.5.2.2 Write to TBUF For writing to TBUF, the header comes from the Microengine and data comes from RBUF or DRAM. Since the Microengine to TBUF header transfer happened in 8-byte chunks, it is possible that the first longword that is inside tr0 may not contain any data if the valid header begins in transfer register tr1.
PAGE 186
Intel® IXP2800 Network Processor Microengines Since data in RBUF or DRAM is arranged in LWBE order, it is swapped on the way into the TBUF to make it truly big-endian, as shown in Figure 64. Again, the invalid bytes at the beginning of the payload that starts at offset 3 and at the end-of-header at offset 2 is removed by the aligner on the way out of TBUF. 4.5.2.
PAGE 187
Intel® IXP2800 Network Processor DRAM DRAM 5 This section describes Rambus* DRAM operation. 5.1 Overview The IXP2800 Network Processor has controllers for three Rambus* DRAM (RDRAM) channels. Either one, two, or three channels can be enabled. When more than one channel is enabled, the channels are interleaved (also known as striping) on 128-byte boundaries to provide balanced access to all populated channels. Interleaving is performed in hardware and is transparent to the programmer.
PAGE 188
Intel® IXP2800 Network Processor DRAM 5.2 Size Configuration Each channel can be populated with 1 – 4 RDRAMs (Short Channel Mode). For supported loading configurations, refer to Table 61. The RAM technology used determines the increment size and maximum memory per channel as shown in Table 62. Note: One or two channels can be left unpopulated if desired. Table 61. RDRAM Loading Bus Interface Maximum Number of Loads Trace Length (inches) Short Channel: 400 and 533 MHz 4 devices per channel.
PAGE 189
Intel® IXP2800 Network Processor DRAM 5.3 DRAM Clocking Figure 66 shows the clock generation for one channel (this description is just for reference; for more information, refer to Rambus* design literature). The other channels use the same configuration. Note: Refer to Section 10 for additional information on clocking. Figure 66.
PAGE 190
Intel® IXP2800 Network Processor DRAM Figure 67. IXP2800 Clocking for RDRAM at 400 MHz DRCG 50 MHz Bus CLK = 400 MHz x8 Phase Detector 25 MHz CLK_PHASE_REF /2 System 100 MHz Ref_Clk 25 MHz /4 /4 PLL PCLK = 100 MHz SYNCLK = 100 MHz RMC /4 RAC A9727-01 Figure 68. IXP2800 Clocking for RDRAM at 508 MHz DRCG 63.5 MHz Bus CLK = 508 MHz x8 Phase Detector 31.75 MHz CLK_PHASE_REF /2 System 100 MHz Ref_Clk /4 31.75 MHz /4 PLL PCLK = 127 MHz RMC SYNCLK = 127 MHz /4 RAC A9728-01 5.
PAGE 191
Intel® IXP2800 Network Processor DRAM 5.5 Interleaving The RDRAM channels are interleaved on 128-byte boundaries in hardware to improve concurrency and bandwidth utilization. Contiguous addresses are directed to different channels by rearranging the physical address bits in a programmable manner described in Section 5.5.1 through Section 5.5.3 and then remapped as described in Section 5.5.4. The block diagram in Figure 69 illustrates the flow. Figure 69.
PAGE 192
Intel® IXP2800 Network Processor DRAM Table 63.
PAGE 193
Intel® IXP2800 Network Processor DRAM Table 64. Address Rearrangement for 3-Way Interleave (Sheet 2 of 2) (Rev B) When these bits of address are all “1”s…1 Add the value in this CSR to the address 30:7 K11 28:7 K10 26:7 K9 24:7 K8 22:7 K7 20:7 K6 18:7 K5 16:7 K4 14:7 K3 12:7 K2 10:7 K1 8:7 K0 None Value 0 added. NOTES: 1. This is a priority encoder; when multiple lines satisfy the condition, the line with the largest number of ones is used. 5.5.
PAGE 194
Intel® IXP2800 Network Processor DRAM 5.5.4 Interleaving Across RDRAMs and Banks In addition to interleaving across the different RDRAM channels, addresses are also interleaved across RDRAM chips and internal banks. This improves utilization since certain operations to different banks can be performed concurrently. The interleaving is done based on rearranging the remapped address derived from Section 5.5.1, Section 5.5.2, and Section 5.5.3 as a function of the memory size as shown in Table 65.
PAGE 195
Intel® IXP2800 Network Processor DRAM 5.6.2 Parity Enabled On writes, odd byte parity is computed for each byte and written into the corresponding parity bit. Partial writes (writes of less than eight bytes) are done as masked writes. On reads, odd byte parity is computed on each byte of data and compared to the corresponding parity bit. If there is an error RDRAMn_Error_Status_1[Uncorr_Err] bit is set, which can interrupt the Intel XScale® core if enabled.
PAGE 196
Intel® IXP2800 Network Processor DRAM To avoid the detection of false ECC errors, the RDRAM ECC mode must be initialized using the procedure described below: 1. Ensure that parity/ECC is not enabled: program DRAM_CTRL[15:14] = 00 2. Write all zeros (0x00000000) to all the memory locations. By default, this initializes the memory with odd parity and in this case (data all 0), it coincides with ECC and does not require any read-modify-writes because ECC is not enabled. 3.
PAGE 197
Intel® IXP2800 Network Processor DRAM 5.8 Microengine Signals Upon completion of a read or write, the RDRAM controller can signal a Microengine context, when enabled. It does so using the sig_done token; see Example 27. Example 27. RDRAM Controller Signaling a Microengine Context dram [read,$xfer6,addr_a,0,1], sig_done_4 dram [read,$xfer7,addr_b,0,1], sig_done_6 ctx_arb[4, 5, 6, 7] Because the RDRAM address space is interleaved, consecutive accesses can go to different RDRAM channels.
PAGE 198
Intel® IXP2800 Network Processor DRAM Serial reads are done by the following steps: 1. Read RDRAM_Serial_Command; test Busy bit until it is a 0. 2. Write RDRAM_Serial_Command to start the read. 3. Read RDRAM_Serial_Command; test Busy bit until it is a 0. 4. Read RDRAM_Serial_Data to collect the serial read data. 5.10 RDRAM Controller Block Diagram The RDRAM controller consists of three pieces. Figure 70 is a simplified block diagram. Figure 70.
PAGE 199
Intel® IXP2800 Network Processor DRAM 5.10.1 Commands When a valid command is placed on the command bus, the control logic checks to see if the address matches the channel’s address range, based on interleaving as described in Section 5.5. The command, address, length, etc. are enqueued into the command Inlet FIFO. If the command Inlet FIFO becomes full, the channel sends a signal to the command arbiter, which will prevent it from sending further DRAM commands.
PAGE 200
Intel® IXP2800 Network Processor DRAM 5.10.3 DRAM Read When a read (or TBUF_WR, which does a DRAM read) command is at the head of the Command Inlet FIFO, it is moved to the proper Bank CMD FIFO if there is room. If there is not enough room in the Bank’s CMD FIFO, the read command waits at the head of the Command Inlet FIFO.
PAGE 201
Intel® IXP2800 Network Processor DRAM 5.10.6 Arbitration The channel needs to arbitrate among several different operations at RMC. Arbitration rules are given here for those cases: from highest to lowest priority: • Refresh RDRAM. • Current calibrate RDRAM. • Bank operations. When there are multiple bank operations ready, the rules are: (1) round robin among banks to avoid bank collisions, and (2) skip a bank to avoid DQ bus turnarounds. No bank can be skipped more than twice.
PAGE 202
Intel® IXP2800 Network Processor DRAM • Supports chaining for burst DRAM push operations to tell the arbiter to grant consecutive push requests. • Supports data error bit handling and delivery. Figure 71 shows the functional blocks for the DRAM Push/Pull Arbiter. Figure 71. DRAM Push/Pull Arbiter Functional Blocks D0-Unit D1-Unit D2-Unit DP-Unit DPSA-FUB PCI TC0-Cluster DPLA-FUB TC1-Cluster Intel XScale® Core TBUF/ RBUF A9731-02 5.11.
PAGE 203
Intel® IXP2800 Network Processor DRAM 5.11.2 DRAM Push Arbiter Description The general data flow for a push operation is as shown in Table 68. The DRAM Push Arbiter functional blocks are shown in Figure 72. Table 68.
PAGE 204
Intel® IXP2800 Network Processor DRAM Figure 72. DRAM Push Arbiter Functional Blocks Round Robin D0_PUSH_ID D0_PUSH_REQ D0_PUSH_DATA D1_PUSH_ID D1_PUSH_REQ D1_PUSH_DATA D2_PUSH_REQ D2_PUSH_ID DPXX_PUSH_ID A R B I T E R DPXX_PUSH_DATA D2_PUSH_DATA A9732-01 The DRM Push Arbiter boundary conditions are: • Make sure each of the push_request queues assert the full signal and back pressure the requesting unit. • Maintain 100% bus utilization, i.e., no holes. 5.
PAGE 205
Intel® IXP2800 Network Processor DRAM When a requestor gets a pull command on the CMD_BUS, the requestor sends the command to the pull arbiter. This is enqueued into a requestor-dedicated FIFO. The pull request FIFOs are much smaller than the push request FIFOs because pull requests can request up to 128 bytes of data. It is eight entries deep and asserts full when it has six entries to account for in-flight requests. The pull arbiter monitors the heads of each of the three FIFOs.
PAGE 206
Intel® IXP2800 Network Processor DRAM 206 Hardware Reference Manual
PAGE 207
Intel® IXP2800 Network Processor SRAM Interface SRAM Interface 6.1 6 Overview The IXP2800 Network Processor contains four independent SRAM controllers. SRAM controllers support pipelined QDR synchronous static RAM (SRAM) and a coprocessor that adheres to QDR signaling. Any or all controllers can be left unpopulated if the application does not need them. Reads and writes to SRAM are generated by Microengines (MEs), the Intel XScale® core, and PCI Bus masters.
PAGE 208
Intel® IXP2800 Network Processor SRAM Interface Figure 74. SRAM Controller/Chassis Block Diagram Command Bus from ME Cluster 0 SRAM chips and/or co-processor Command Bus from ME Cluster 1 SRAM Controller Push Bus / ID to ME Cluster 0 Push Arb Push Bus / ID to ME Cluster 1 Push Arb Pulll ID to ME Cluster 0 Pull Arb Pulll ID to ME Cluster 1 Pull Arb Pull Data from ME Cluster 0 SRAM Controller SRAM Controller SRAM Controller Pull Data from ME Cluster 1 A8951-01 6.
PAGE 209
Intel® IXP2800 Network Processor SRAM Interface In general, QDR and QDR II bursts of two SRAMs are supported at speeds up to 233 MHz. As other (larger) QDR SRAMs are introduced, they will also be supported. The SRAM controller can also be configured to interface to an external coprocessor that adheres to the QDR or QDR II electrical and functional specification. 6.2.
PAGE 210
Intel® IXP2800 Network Processor SRAM Interface Each channel can be expanded in depth according to the number of port enables available. If external decoding is used, then the number of SRAMs is not limited by the number of port enables generated by the SRAM controller. Note: External decoding may require external pipeline registers to account for the decode time, depending on the desired frequency. Maximum SRAM system sizes are shown in Table 71.
PAGE 211
Intel® IXP2800 Network Processor SRAM Interface A side-effect of the pipeline registers is to add latency to reads, and the SRAM controller must account for that delay by waiting extra cycles (relative to no external pipeline registers) before it registers the read data. The number of extra pipeline delays is programmed in SRAM_Control[Pipeline]. Figure 76. External Pipeline Registers Block Diagram SRAM SRAM Intel® IXP2800 Network Processor Q Register Addr, BWE, etc. Register A9735-01 6.
PAGE 212
Intel® IXP2800 Network Processor SRAM Interface Table 72. Atomic Operations Instruction Pull Operand Value Written to SRAM Set_bits Optional1 SRAM_Read_Data or Pull_Data Clear_bits Optional SRAM_Read_Data and not Pull_Data Increment No SRAM_Read_Data + 0x00000001 Decrement No SRAM_Read_Data - 0x00000001 Add Optional SRAM_Read_Data + Pull_Data Swap Optional Pull_Data 1. There are two versions of the Set, Clear, Add, and Swap instructions.
PAGE 213
Intel® IXP2800 Network Processor SRAM Interface 6.4.3 Queue Data Structure Commands The ability to enqueue and dequeue data buffers at a fast rate is key to meeting chip performance goals. This is a difficult problem as it involves dependent memory references that must be turned around very quickly. The SRAM controller includes a data structure (called the Q_array) and associated control logic to perform efficient enqueue and dequeue operations.
PAGE 214
Intel® IXP2800 Network Processor SRAM Interface The ENQ_tail_and_link command followed by ENQ_tail enqueue a previously linked string of buffers. The string of buffers is used in the case where one packet is too large to fit in one buffer. Instead, it is divided among multiple buffers. Figure 79 is an example of a string of buffers. Figure 79.
PAGE 215
Intel® IXP2800 Network Processor SRAM Interface There are two different modes for the dequeue command. One mode removes an entire buffer from the queue. The second mode removes a piece of the buffer (referred to as a cell). The mode (cell dequeue or buffer dequeue) is selectable on a buffer-by-buffer basis by setting the cell_count bits (<30:24>) in the link longword. A ring is an ordered list of data words stored in a fixed block of contiguous addresses.
PAGE 216
Intel® IXP2800 Network Processor SRAM Interface Table 74. Ring/Journal Format Longword Number Name Note: Ring Size 0 Bit Number 31:29 Definition See Table 75 for size encoding. Head 0 23:0 Get pointer Tail 1 23:0 Put pointer Ring Count 2 23:0 Number of longwords on the ring For a Ring or Journal, Head and Tail must be initialized to the same address. Journals/Rings can be configured to be one of eight sizes, as shown in Table 75. Table 75.
PAGE 217
Intel® IXP2800 Network Processor SRAM Interface 6.4.3.3 ENQ and DEQ Commands These commands add or remove elements from the queue structure while updating the Q_array registers. Refer to the sections, “SRAM (Enqueue)” and “SRAM (Dequeue)”, in the IXP2400 and IXP2800 Network Processor Programmer’s Reference Manual, for more information. 6.4.
PAGE 218
Intel® IXP2800 Network Processor SRAM Interface Note: If incorrect parity is detected on the read portion of an atomic read-modify-write, the incorrect parity is preserved after the write (that is, the byte(s) with bad parity during the read will have incorrect parity written during the write). When parity is used, the Intel XScale® core software must initialize the SRAMs by: 1. Enabling parity (write a 1 to SRAM_Control[Par_Enable]). 2. Writing to every SRAM address.
PAGE 219
Intel® IXP2800 Network Processor SRAM Interface 6.7 Reference Ordering This section describes the ordering between accesses to any one SRAM controller. Various mechanisms are used to guarantee order — for example, references that always go to the same FIFOs remain in order. There is a CAM associated with write addresses that is used to order reads behind writes. Lastly, several counter pairs are used to implement “fences”.
PAGE 220
Intel® IXP2800 Network Processor SRAM Interface Table 78. Q_array Entry Reference Order 1st ref 2nd ref Read_Q _Descr head, tail Read_ Q_Des cr other Read_Q_Descr head,tail Read_Q_ Descr other Write_Q _Descr Enqueue Dequeue Put Get Journal Order1 Order Write_Q_ Descr2 Enqueue Order Order Order Order3 Dequeue Order Order Order3 Order Put Order Get Order Journal 1. 2. 3. 6.7.
PAGE 221
Intel® IXP2800 Network Processor SRAM Interface Other microcode rules: • All access to atomic variables should be through read-modify-write instructions. • If the flow must know that a write is completed (actually in the SRAM itself), follow the write with a read to the same address. The write is guaranteed to be complete when the read data has been returned to the Microengine.
PAGE 222
Intel® IXP2800 Network Processor SRAM Interface The external coprocessor interface is based on FIFO communication. A thread can send parameters to the coprocessor by doing a normal SRAM write instruction: sram[write, $sram_xfer_reg, src1, src2, ref_count], optional_token The number of parameters (longwords) passed is specified by ref_count. The address can be used to support multiple coprocessor FIFO ports.
PAGE 223
Intel® IXP2800 Network Processor SRAM Interface There can be multiple operations in progress in the coprocessor. The SRAM controller sends parameters to the coprocessor in response to each SRAM write instruction without waiting for return results of previous writes. If the coprocessor is capable of re-ordering operations — i.e., returning the results for a given operation before returning the results of an earlier arriving operation — Microengine code must manage matching results to operations.
PAGE 224
Intel® IXP2800 Network Processor SRAM Interface 224 Hardware Reference Manual
PAGE 225
Intel® IXP2800 Network Processor SHaC — Unit Expansion SHaC — Unit Expansion 7 This section covers the operation of the Scratchpad, Hash Unit, and CSRs (SHaC). 7.1 Overview The SHaC unit is a multifunction block containing Scratchpad memory and logic blocks used to perform hashing operations and interface with the Intel XScale® core peripherals and control status registers (CSRs) through the Advanced Peripheral Bus (APB) and CSR buses, respectively.
PAGE 226
Intel® IXP2800 Network Processor SHaC — Unit Expansion Figure 84.
PAGE 227
Intel® IXP2800 Network Processor SHaC — Unit Expansion 7.1.2 Scratchpad 7.1.2.1 Scratchpad Description The SHaC Unit contains a 16-Kbyte Scratchpad memory, organized as 4K 32-bit words, that is accessible by the Intel XScale® core and Microengines. The Scratchpad connects to the internal Command, S_Push and S_Pull, CSR, and APB buses, as shown in Figure 85. The Scratchpad memory provides the following operations: • Normal reads and writes.
PAGE 228
Intel® IXP2800 Network Processor SHaC — Unit Expansion Figure 85.
PAGE 229
Intel® IXP2800 Network Processor SHaC — Unit Expansion 7.1.2.2 Note: Scratchpad Interface The Scratchpad command and S_Push and S_Pull bus interfaces actually are shared with the Hash Unit. Only one command, to either of those units, can be accepted per cycle. The CSR and APB buses are described in detail in the following sections. 7.1.2.2.1 Command Interface The Scratchpad accepts commands from the Command Bus and can accept one command every cycle.
PAGE 230
Intel® IXP2800 Network Processor SHaC — Unit Expansion If the Command Inlet FIFO becomes full, the Scratchpad controller sends a full signal to the command arbiter that prevents it from sending further Scratchpad commands. 7.1.2.3.1 Scratchpad Commands The basic read and write commands transfer from 1 – 16 longwords of data to/from the Scratchpad. Reads When a read command is at the head of the Command queue, the Push Arbiter is checked to see if it has enough room for the data.
PAGE 231
Intel® IXP2800 Network Processor SHaC — Unit Expansion When the RMW command reaches the head of the Command pipe, the Scratchpad reads the memory location in the RAM. If the source requests the pre-modified data (Token[0] set), it is sent to the Push Arbiter at the time of the read. If the RMW requires pull data, the command waits for the data to be placed into the Pull Data FIFO before performing the operation; otherwise the operation is performed immediately.
PAGE 232
Intel® IXP2800 Network Processor SHaC — Unit Expansion Head, Tail, Base, and Size are registers in the Scratchpad Unit. Head and Tail point to the actual ring data, which is stored in the Scratchpad RAM. For each ring in use, a region of Scratchpad RAM must be reserved for the ring data. The reservation is by software convention. The hardware does not prevent other accesses to the region of Scratchpad used by the ring. Also, the regions of Scratchpad memory allocated to different rings must not overlap.
PAGE 233
Intel® IXP2800 Network Processor SHaC — Unit Expansion The ring commands operate as outlined in the pseudo-code in Example 32. The operations are atomic, meaning that multi-word “Gets” and “Puts” do all the reads and writes, with no other intervening Scratchpad accesses. Example 32.
PAGE 234
Intel® IXP2800 Network Processor SHaC — Unit Expansion For writes using the Reflector mode, Scratchpad arbitrates for the S_Pull_Bus, pulls the write data from the source identified in the instruction (either a Microengine transfer register or an Intel XScale® core write buffer), and puts it into one of the Pull Data FIFOs (same as for APB and CAP CSR writes). The data is then removed from the Pull Data FIFO and sent to the Push Arbiter.
PAGE 235
Intel® IXP2800 Network Processor SHaC — Unit Expansion 7.1.2.3.3 Clocks and Reset Clock generation and distribution is handled outside of CAP and is dependent on the specific chip implementation. Separate clock rates are required for CAP CSRs/Push/Pull Buses and ARB since APB devices tend to run slower. CAP provides reset signals for the CAP CSR block and APB devices. These resets are based on the system reset signal and synchronized to the appropriate bus clock.
PAGE 236
Intel® IXP2800 Network Processor SHaC — Unit Expansion 7.1.3 Hash Unit The SHaC unit contains a Hash Unit that can take 48-, 64-, or 128-bit data and produce a 48-, 64-, or a 128-bit hash index, respectively. The Hash Unit is accessible by the Microengines and the Intel XScale® core. Figure 87 is a block diagram of the Hash Unit. Figure 87. Hash Unit Block Diagram 3-Stage Command Pipe HASH_ PUSH_CMD From SCR HASH_CMD Hash State Machine SCR_HASH_CMD To SCR .
PAGE 237
Intel® IXP2800 Network Processor SHaC — Unit Expansion 7.1.3.1 Hashing Operation Up to three hash indexes (see Example 33) can be created by using one Microengine instruction. Example 33.
PAGE 238
Intel® IXP2800 Network Processor SHaC — Unit Expansion Table 82.
PAGE 239
Intel® IXP2800 Network Processor SHaC — Unit Expansion The Hash Unit shares the Scratchpad’s Push Data FIFO. After each hash index is completed, the index is placed into a three-stage output pipe and the Hash Unit sends a PUSH_DATA_REQ to the Scratchpad to indicate that it has a valid hash index to put into the Push Data FIFO for transfer. The Scratchpad issues a SEND_HASH_DATA signal, transfers the hash index to the Push Data FIFO, and sends the data to the Arbiter.
PAGE 240
Intel® IXP2800 Network Processor SHaC — Unit Expansion 10 25 36 48 (48-bit hash operation) 17 35 54 64 (64-bit hash operation) Equation 7. G 48 ( x ) = 1 + x + x + x + x Equation 8. G 64 ( x ) = 1 + x + x + x + x 33 69 98 Equation 9. G 128 ( x ) = 1 + x + x + x + x 128 (128-bit hash operation) The division results in a quotient Q(x), a polynomial of order-46, order-62, or order-126, and a remainder R(x), and a polynomial of order-47, order-63, or order-127.
PAGE 241
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Media and Switch Fabric Interface 8.1 8 Overview The Media and Switch Fabric (MSF) Interface connects the IXP2800 Network Processor to a physical layer device (PHY) and/or to a Switch Fabric. The MSF consists of separate receive and transmit interfaces, each of which can be separately configured for either SPI-4 Phase 2 (System Packet Interface), for PHY devices or for the CSIX-L1 protocol, for Switch Fabric Interfaces.
PAGE 242
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Figure 88. Example System Block Diagram Receive protocol is SPI-4 Transmit mode is CSIX Ingress Intel® IXP2800 Network Processor RDAT TDAT Framing/MAC Device (PHY) RSTAT Flow Control SPI-4 Protocol Egress Intel IXP2800 Network Processor Optional Gasket (Note 1 ) Switch Fabric CSIX Protocol TSTAT RDAT TDAT Receive protocol is CSIX Transmit mode is SPI-4 Notes: 1.
PAGE 243
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Figure 89. Full-Duplex Block Diagram Receive and transmit protocol is SPI-4 and CSIX on transferby-transfer basis. Intel® IXP2800 Network Processor RDAT Framing/MAC Device (PHY) TDAT Tx Rx Switch Fabric Bus Converter UTOPIA-3 or IXBUS Protocol Tx Rx CSIX Protocol Notes: The Bus Converter chip receives and transmits both SPI-4 and CSIX protocols from/to Intel IXP2800 Network Processor.
PAGE 244
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 83. SPI-4 Control Word Format Bit Position Label 15 Type Description Control Word Type. • 1—payload control word (payload transfer will immediately follow the control word). • 0—idle or training control word. End-of-Packet (EOP) Status. Set to the following values below according to the status of the immediately preceding payload transfer. • 00—Not an EOP. 14:13 EOPS • 01—EOP Abort (application-specific error condition).
PAGE 245
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 84 shows the order of bytes on SPI-4; this example shows a 43-byte packet. Table 84. Order of Bytes1 within the SPI-4 Data Burst Bit 15 1. 2. Bit 8 Bit 7 Bit 0 Data Word 1 Byte 1 Byte 2 Data Word 2 Byte 3 Byte 4 Data Word 3 Byte 5 Byte 5 Data Word 4 Byte 7 Byte 6 … … … … … … … … … Data Word 21 Byte 41 Byte 42 Data Word 22 Byte 432 00 These bytes are valid only if EOP is set.
PAGE 246
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.1.2 CSIX CSIX_L1 (Common Switch Interface) defines an interface between a Traffic Manager (TM) and a Switch Fabric (SF) for ATM, IP, MPLS, Ethernet, and similar data communications applications. The Network Processor Forum (NPF) www.npforum.org, controls the CSIX_L1 specification. The basic unit of information transferred between TMs and SFs is called a CFrame. There are a number of CFrame types defined as shown in Table 85. Table 85.
PAGE 247
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.2 Receive The receive section consists of: • • • • • • Receive Pins (Section 8.2.1) Checksum (Section 8.2.2) Receive Buffer (RBUF) (Section 8.2.2) Full Element List (Section 8.2.3) Rx_Thread_Freelist (Section 8.2.4) Flow Control Status (Section 8.2.7) Figure 91 is a simplified block diagram of the receive section. Figure 91.
PAGE 248
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.2.1 Receive Pins The use of the receive pins is a function of RPROT input, as shown in Table 86. Table 86.
PAGE 249
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 88.
PAGE 250
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The src_op_1 and src_op_2 operands are added together to form the address in RBUF (note that the base address of the RBUF is 0x2000). The ref_cnt operand is the number of 32-bit words or word pairs, that are pushed into two sequential S_TRANSFER_IN registers, starting with $s_xfer_reg. Using the data in RBUF in Table 87 above, reading eight bytes from offset 0 into transfer registers 0 and 1 would yield the result in Example 34. Example 34.
PAGE 251
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Section 8.2.7.1). The SPI-4 Control Word Type, EOPS, SOP, and ADR fields are placed into a temporary status register. The Byte_Count field of the element status is set to 0x0. As each Data Word is received, the data is written into the element, starting at offset 0x0 in the element, and Byte_Count is updated. Subsequent Data transfers are placed at higher offsets (0x2, 0x4, etc.).
PAGE 252
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The status contains the following information: 2 3 2 2 Element 6 2 6 1 6 0 5 9 5 8 2 1 2 0 1 9 1 8 1 7 1 6 Byte Count 5 7 5 6 5 5 5 4 5 3 5 2 5 1 5 0 4 9 4 8 1 5 1 4 1 3 1 2 4 7 1 1 1 0 4 6 4 5 4 4 9 8 Null 2 4 Type 2 5 Par Err 2 6 Abort Err 2 7 Len Err RPROT 6 3 2 8 Err 2 9 EOP 3 0 SOP 3 1 4 3 4 2 4 1 4 0 Reserved 7 6 5 4 3 2 1 0 3 4 3 3 3 2 ADR 3 9 3 8 3 7 3 6 3 5
PAGE 253
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.2.2.2 CSIX CSIX CFrames are placed into either RBUF or FCEFIFO as follows: At chip reset, all RBUF elements are marked invalid (available) and FCEFIFO is empty. When a Base Header is sent (i.e., when RxSof is asserted) it is placed in a temporary holding register. The Ready Field is extracted and held to be put into FC_Egress_Status CSR when (and if) the entire CFrame is received without error.
PAGE 254
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Note: In CSIX protocol, an RBUF element is allocated only on RxSof assertion. Therefore, the element size must be programmed based on the Switch Fabric usage. For example, if the switch never sends a payload greater than 128 bytes, then 128-byte elements can be selected. Otherwise, 256-byte elements must be selected.
PAGE 255
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.2.3 Full Element List Receive control hardware maintains the Full Element List to hold the status of valid RBUF elements, in the order in which they were received. When an element is marked valid (as described in Section 8.2.2.1 for SPI-4 and Section 8.2.2.2 for CSIX), its status is added to the tail of the Full Element List.
PAGE 256
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.2.5 Rx_Thread_Freelist_Timeout_# Each Rx_Thread_Freelist_# has an associated countdown timer. If the timer expires and no new receive data is available yet, the receive logic will autopush a Null Receive Status Word to the next thread on the Rx_Thread_Freelist_#. A Null Receive Status Word has the “Null” bit set, and does not have any data or RBUF entry associated with it.
PAGE 257
Intel® IXP2800 Network Processor Media and Switch Fabric Interface When an mpacket becomes valid as described in Section 8.2.2.1 for SPI-4 and Section 8.2.2.2 for CSIX, receive control logic will autopush eight bytes of information for the element to the Microengine/Context/S_Transfer registers at the head of Rx_Thread_Freelist_#.
PAGE 258
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 93 summarizes the differences in RBUF operation between the SPI-4 and CSIX protocols. Table 93. Summary of SPI-4 and CSIX RBUF Operations Operation SPI-4 CSIX When is RBUF Element Allocated? Upon receipt of Payload Control Word or when Element data section fills and more Data Words arrive. The Payload Control Word allocates an element for data that will be received subsequent to it.
PAGE 259
Intel® IXP2800 Network Processor Media and Switch Fabric Interface When MSF_RX_CONTROL[RX_Calendar_Mode] is set to Force_Override, the value of RX_PORT_CALENDAR_STATUS_# is used to determine which status value is sent. If RX_PORT_CALENDAR_STATUS_# is set to 0x3, then the global status value set in MSF_RX_CONTROL[RSTAT_OV_VALUE] is sent; otherwise, the port-specific status value set in RX_PORT_CALENDAR_STATUS_# is sent. The RBUF upper limit is based on the MSF_RX_CONTROL register and is defined in Table 89.
PAGE 260
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.2.7.2.2 Virtual Output Queue CSIX protocol provides Virtual Output Queue Flow Control via Flow Control CFrames.
PAGE 261
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.2.8.2 8.2.8.2.1 CSIX Horizontal Parity The receive logic computes Horizontal Parity on each 16 bits of each received Cword (there is a separate parity for data received on rising and falling edge of the clock). There is an internal HP Error Flag. At the end of each CFrame, the flag is reset. As each 16 bits of each Cword is received, the expected odd-parity value is computed from the data, and compared to the value received on RxPar.
PAGE 262
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.3 Transmit The transmit section consists of: • Transmit Pins (Section 8.3.1) • Transmit Buffer (Section 8.3.2) • Byte Aligner (Section 8.3.2) Each of these is described below. Figure 94 is a simplified block diagram of the MSF transmit block. S_Pull_Data (32-bits from ME) - D_Push_Data (64-bits from DRAM) TBUF - - - - - - - - - - - - - SPI-4 Protocol Logic - CSIX Protocol Logic Byte Align Figure 94.
PAGE 263
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 94. Transmit Pins Usage by Protocol (Sheet 2 of 2) Name 8.3.2 Direction SPI-4 Use CSIX Use TPAR Output Not Used RTxPar TSCLK Input TSCLK Not Used TSTAT[1:0] Input TSTAT[1:0] Not Used TBUF The TBUF is a RAM that holds data and status to be transmitted. The data is written into subblocks referred to as elements, by Microengine or the Intel XScale® core. TBUF contains a total of 8 Kbytes of data, and associated control.
PAGE 264
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 97 shows the TBUF partition options. Note that the choice of element size is independent for each partition. Table 97.
PAGE 265
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Payload Offset — Number of bytes to skip from the last 64-bit word of the Prepend to the start of Payload. The absolute byte number of the first byte of Payload in the element is: ((Prepend Offset + Prepend Length + 0x7) && 0xF8) + Payload Offset. Payload Length — Number of bytes of Payload.
PAGE 266
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.3.2.1 SPI-4 For SPI-4, data is put into the data portion of the element, and information for the SPI-4 Control Word that will precede the data is put into the Element Control Word.
PAGE 267
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.3.2.2 CSIX For CSIX protocol, the TBUF should be set to two partitions in MSF_Tx_Control[TBUF_Partition], one for Data traffic and one for Control traffic. Payload information is put into the Payload area of the element, and Base and Extension Header information is put into the Element Control Word. Data is stored in big-endian order. The most significant byte of each 32-bit word is transmitted first.
PAGE 268
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.3.3 Transmit Operation Summary During transmit processing data to be transmitted is placed into the TBUF under Microengine control, which allocates an element in software. The transmit hardware processes TBUF elements within a partition, in strict sequential order so the software can track the element to allocate next.
PAGE 269
Intel® IXP2800 Network Processor Media and Switch Fabric Interface If the next sequential element is not valid when its turn comes up: 1. Send an idle Control Word with SOP set to 0, and EOPS set to the values determined from the most recently sent element, ADR field 0x00, correct parity. 2. Until an element becomes valid, send idle Control Words with SOP set to 0, EOPS set to 00, ADR field 0x00, and correct parity.
PAGE 270
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Note: A Dead Cycle is any cycle after the end of a CFrame, and prior to the start of another CFrame (i.e., SOF is not asserted). The end of a CFrame is defined as after the Vertical Parity has been transmitted. This in turn is found by counting the Payload Bytes specified in the Base Header and rounding up to CWord size. After an element has been sent on the transmit pins, the valid bit for that element is cleared.
PAGE 271
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.3.4.1 SPI-4 FIFO status information is sent periodically over the TSTAT signals from the PHY to the Link Layer device, which is the IXP2800 Network Processor. (The RXCDAT pins can act as TSTAT based on the MSF_Tx_Control[TSTAT_Select] bit.
PAGE 272
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The TX_Port_Status_# or the TX_Multiple_Port_Status_# registers must be read by the software to determine the status of each port and send data to them accordingly. The MSF hardware does not check these registers for port status before sending data out to a particular port. The MSF_Tx_Control[Tx_Status_Update_Mode] field is used to select one of two methods for updating the port status.
PAGE 273
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.3.4.2 CSIX There are two types of CSIX flow control: • Link-level • Virtual Output Queue (VOQ) 8.3.4.2.1 Link-Level The Link-level flow control function is done via hardware and consists of two parts: 1. Enable/disable transmission of valid TBUF elements. 2. Ready field to be sent in CFrames sent to the Switch Fabric. As described in Section 8.2.
PAGE 274
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.3.5.2 CSIX 8.3.5.2.1 Horizontal Parity The transmit logic computes odd Horizontal Parity for each transmitted 16-bits of each Cword, and transmits it on TxPar. 8.3.5.2.2 Vertical Parity The transmit logic computes Vertical Parity on CFrames. There is a 16-bit VP Accumulator register. At the beginning of each CFrame, the register is cleared.
PAGE 275
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 102. Summary of RBUF and TBUF Operations (Sheet 2 of 2) Operation RBUF Remove data from element Return element to Free List 8.5 TBUF Microcode moves data from the element to DRAM using the dram[rbuf_rd] instruction and to Microengine registers using the msf[read] instruction. Microcode writes to Rx_Element_Done with the number of the element to free. Hardware transmits information from the element to the Tx pins.
PAGE 276
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The information transmitted on TXCSRB can be read in FC_Egress_Status CSR, and the information received on RXCSRB can be read in FC_Ingress_Status CSR. The TXCSRB or RXCSRB signals carry the Ready information in a serial stream. Four bits of data are carried in 10 clock phases, LSB first, as shown in Table 103. Table 103.
PAGE 277
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.5.2.1 Full Duplex CSIX In Full Duplex Mode, the information from the Switch Fabric is sent to the Egress IXP2800 Network Processor and must be communicated to the Ingress IXP2800 Network Processor via TXCSRB or RXCSRB. CSIX CFrames received from the Switch Fabric on the Egress IXP2800 Network Processor are put into FCEFIFO, based on the mapping in the CSIX_Type_Map CSR (normally they will be the Flow Control CFrames).
PAGE 278
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The FCIFIFO supplies two signals to Microengines, which can be tested using the BR_STATE instruction: 1. FCI_Not_Empty — indicates that there is at least one CWord in the FCIFIFO. This signal stays asserted until all CWords have been read.
PAGE 279
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The TXCSRB and RXCSRB pins are not used in Simplex Mode. The RXCFC and TXCFC pins are used for flow control in both Simplex and Duplex Modes. The Egress IXP2800 Network Processor uses the TXCSOF, TXCDAT, and TXCPAR pins to send CFrames to the Switch Fabric.
PAGE 280
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.5.3 TXCDAT/RXCDAT, TXCSOF/RXCSOF, TXCPAR/RXCPAR, and TXCFC/RXCFC Signals TXCDAT and RXCDAT, along with TXCSOF/RXCSOF and TXCPAR/RXCPAR are used to send CSIX Flow Control information from the Egress IXP2800 Network Processor to the Ingress IXP2800 Network Processor. The protocol is basically the same as CSIX-LI, but with only four data signals. TXCSOF is asserted to indicate start of a new CFrame.
PAGE 281
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The IXP2800 Network Processor supports all three methods. There are three groups of high-speed pins to which this applies, as shown in Table 104, Table 105, and Table 106. The groups are defined by the clock signal that is used. Table 104. Data Deskew Functions Clock Signals RDAT RCLK RCTL IXP2800 Network Processor Operation 1. Sample point for each pin is programmed in Rx_Deskew. 2.
PAGE 282
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.6.1 Data Training Pattern The data pin training sequence is shown in Table 107. This is a superset of SPI-4 training sequence, because it includes the TPAR/RPAR and TPROT/RPOT pins, which are not included in SPI-4. Table 107.
PAGE 283
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The training sequence when the pins are used for SPI-4 Status Channel is shown in Table 109. This is compatible to SPI-4 training sequence. Table 109. Calendar Training Sequence Cycle (Note 3) 1 to 10 XCDAT 1 0 0 0 11 to 20 1 1 20α-19 to 20α-10 0 0 20α-9 to 20α 1 1 NOTE: 1. α represents the number of repeats, as specified in SPI-4 specification.
PAGE 284
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 110. IXP2800 Network Processor Requires Data Training Step SPI-4 (IXP2800 Network Processor is Ingress Device) CSIX (IXP2800 Network Processor is Egress Device) Full Duplex Simplex 1 Detect need for training (for example, reset or excessive parity errors).
PAGE 285
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 111. Switch Fabric or SPI-4 Framer Requires Data Training CSIX Step SPI-4 Full Duplex Simplex 1 Framer sends continuous framing code on IXP2800 calendar status pins TSTAT (when using LVTTL status channel) or sends continuos training on IXP2800 calendar status pins RXCDAT (when using LVDS status channel). Switch Fabric sends continuous Dead Cycles on Data. Switch Fabric sends continuous Dead Cycles on Flow Control.
PAGE 286
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 112 lists the steps to initiate the training. CSIX Full Duplex and CSIX Simplex cases follow similar, but slightly different sequences. Table 112. IXP2800 Network Processor Requires Flow Control Training Step CSIX (IXP2800 Network Processor is Ingress Device) Full Duplex Simplex 1 Force TXCFC pin asserted (Write a 0 to Train_Flow_Control [RXCFC_En]). Force Data pins to continuos Dead Cycles (Write a 1 to Train_Data[Force_CDead]).
PAGE 287
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.7 CSIX Startup Sequence This section defines the sequence required to startup the CSIX interface. 8.7.1 CSIX Full Duplex 8.7.1.1 Ingress IXP2800 Network Processor 1. On reset, FC_STATUS_OVERRIDE[Egress_Force_En] is set to force the Ingress IXP2800 to send Idle CFrames with low CReady and DReady bits to the Egress IXP2800 over TXCSRB. 2.
PAGE 288
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.7.1.3 Single IXP2800 Network Processor 1. The Microengine or the Intel XScale® core writes a 1 to MSF_Tx_Control[Transmit_Idle] and MSF_Tx_Control[Transmit_Enable] so that Idle CFrames with low CReady and DReady bits are sent over TDAT. 2. The Microengine or the Intel XScale® core writes a 1 to MSF_Rx_Control[RX_En_C] so that Idle CFrames can be received. 3.
PAGE 289
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.7.2.2 Egress IXP2800 Network Processor 1. On reset, FC_STATUS_OVERRIDE[Ingress_Force_En] is set. 2. The Microengine or the Intel XScale® core writes a 1 to MSF_Tx_Control[Transmit_Idle] and MSF_Tx_Control[Transmit_Enable] so that Idle CFrames with low CReady and DReady bits are sent over TDAT. 3. The Microengine or the Intel XScale® core polls on MSF_Interrupt_Status[Detected_CSIX_FC_Idle] to see when the first Idle CFrame is received.
PAGE 290
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.8 Interface to Command and Push and Pull Buses Figure 100 shows the interface of the MSF to the command and push and pull buses.
PAGE 291
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.8.1 RBUF or MSF CSR to Microengine S_TRANSFER_IN Register for Instruction: msf[read, $s_xfer_reg, src_op_1, src_op_2, ref_cnt], optional_token For transfers to a Microengine, the MSF acts as a target. Commands from Microengines and the Intel XScale® core are received on the command bus. The commands are checked to see if they are targeted to the MSF.
PAGE 292
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.8.5 From DRAM to TBUF for Instruction: dram[tbuf_wr, --, src_op1, src_op2, ref_cnt], indirect_ref For the transfers from DRAM, the TBUF acts like a slave. The address of the data to be written is given in D_PUSH_ID. The data is registered and assembled from D_PUSH_BUS, and then written into TBUF. 8.
PAGE 293
Intel® IXP2800 Network Processor Media and Switch Fabric Interface SPI-4.2 supports up to 256 port addresses, with independent flow control for each. For data received by the PHY and passed to the link layer device, flow control is optional. The flow control mechanism is based upon independent pools of credits, corresponding to 16-byte blocks, for each port. The CSIX-L1 protocol supports 4096 ports and 256 unicast classes of traffic.
PAGE 294
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The SPI-4.2 mode of the simplex configuration supports an LVTTL reverse path or status interface clocked at up to 125 MHz or a DDR LVDS reverse path or status interface clocked at up to 500 MHz. The SPI-4.2 mode status interface consists of a clock signal and two data signals.
PAGE 295
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.1.3 Dual Network Processor Full Duplex Configuration In the dual Network Processor, full duplex configuration, an ingress Network Processor and an egress Network Processor are integrated to offer a single full duplex interface to a fabric, similar to the CSIX-L1 interface, as shown in Figure 104. This configuration provides an interface that is closest to the standard CSIX-L1 interface.
PAGE 296
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.1.4 Single Network Processor Full Duplex Configuration (SPI-4.2) The single Network Processor, full duplex configuration (SPI-4.2 only) allows a single Network Processor to interface to multiple discrete devices, processing both the receiver and transmitter data for each, as shown in Figure 105 (where N=255). Up to 256 devices can be addressed by the SPI4.2 implementation.
PAGE 297
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.1.5 Single Network Processor, Full Duplex Configuration (SPI-4.2 and CSIX-L1) The Single Network Processor, Full Duplex Configuration (SPI-4.2 and CSIX-L1 Protocol) allows a single Network Processor to interface to a fabric via a CSIX-L1 interface and to multiple other discrete devices, as shown in Figure 106. The CSIX-L1 and SPI-4.2 protocols are multiplexed on the network processor receiver and transmitter interface.
PAGE 298
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.2.1 Framer, Single Network Processor Ingress and Egress, and Fabric Interface Chip Figure 107 illustrates the baseline system configuration consisting of the dual chip, full-duplex fabric configuration of network processors with a framer chip and a fabric interface chip Figure 107.
PAGE 299
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.2.3 Framer, Single Network Processor Ingress and Egress, and CSIX-L1 Chips for Translation and Fabric Interface To interface to existing standard CSIX-L1 fabric interface chips, a translation bridge can be employed, as shown in Figure 109. Translation between the network processor interface and standard CSIX-L1 is very simple by design. Figure 109.
PAGE 300
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.2.5 Framer, Single Network Processor, Co-Processor, and Fabric Interface Chip The network processor supports multiplexing the SPI-4.2 and CSIX-L1 protocols over its physical interface via a protocol signal.
PAGE 301
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.3 SPI-4.2 Support Data is transferred across the SPI-4.2 interface in variously-sized bursts and encapsulated with a leading and trailing control word. The control words provide annotation of the data with port address (0-255) information, start-of-packet and end-of-packet markers, and an error detection code (DIP-4). Data must be transferred in 16-byte integer multiples, except for the final burst of a packet. Figure 112. SPI-4.
PAGE 302
Intel® IXP2800 Network Processor Media and Switch Fabric Interface As threads complete processing of the data in a buffer, the buffer is returned to a free list. Subsequently, the thread also returns to a separate free list. The return of buffers and threads to the free lists may occur in a different order than the order of their removal. All SPI-4.2 ports sharing the interface have equal access to the buffering resources. Flow control can transition to a non-starving state when 25%, 50%, 75%, or 87.
PAGE 303
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.4 CSIX-L1 Protocol Support 8.9.4.1 CSIX-L1 Interface Reference Model: Traffic Manager and Fabric Interface Chip The CSIX-L1 protocol operates between a Traffic Manger and a Fabric Interface Chip(s) across a full-duplex interface.
PAGE 304
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Information is passed across the interface in CFrames. CFrames are padded out to an integer multiple of CWords. CFrames consist of a 2-byte base header, an optional 4-byte extension header, a payload of 1 to 256 bytes, padding, and a 2-byte vertical parity. Transfers across the interface are protected by a horizontal parity.
PAGE 305
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The network processor supports a variation of the standard CSIX-L1 vertical parity. Instead of a single vertical XOR for the calculation of the vertical parity, the network processor can be configured to calculate as DIP-16 code, as documented within the SPI-4.2 specification.
PAGE 306
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The backpressure signal (TXCFC, RXCFC) is an asynchronous signal and is asserted by the ingress network processor to prevent overflow of the ingress network processor ingress flow control FIFO. If the egress network processor is so optionally configured, it will react to assertion of the backpressure signal for 32 clock cycles (64 edges) as a request for a de-skew training sequence to be transmitted on the flow control interface.
PAGE 307
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The transfer time of CFrames across the RPCI is four times that of the data interface. The latency of link-level flow control notifications depends on the frequency of sending new CFrame base headers. As such, the maximum size of CFrames supported on the RPCI should be limited to provide sufficient link-level flow control responsiveness.
PAGE 308
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The SPI-4.2 interface does not support a virtual output queue (VOQ) flow control mechanism. The Intel® IXP2800 Network Processor supports use of the CSIX-L1 protocol-based flow control interface (as used in the dual chip, full-duplex configuration) on the ingress network processor, while SPI-4.2 is operational on the data interface.
PAGE 309
Intel® IXP2800 Network Processor Media and Switch Fabric Interface The training pattern for the flow control data signals consists of 10 nibbles of 0xc followed by 10 nibbles of 0x3. The parity and serial “ready bits” signal is de-asserted for the first 10 nibbles and asserted for the second 10 nibbles. The start-of-frame signal is asserted for the first 10 nibbles and de-asserted for the second 10 nibbles. See Section 8.6.2 for more information.
PAGE 310
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.4.4 CSIX-L1 Protocol Transmitter Support The Intel® IXP2800 Network Processor transmitter support for the CSIX-L1 protocol is similar to that for SPI-4.2. The transmitter fetches CFrames from transmitter buffers. An entire CFrame must fit within a single buffer. In the case of SPI-4.2, the array of transmitter buffers operates as a single ring.
PAGE 311
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.4.5 Implementation of a Bridge Chip to CSIX-L1 The Intel® IXP2800 Network Processor support for the CSIX-L1 protocol in the dual chip, fullduplex configuration minimizes the difficulty in implementing a bridge chip to a standard CSIX-L1 interface. If dynamic de-skew training is not employed, the bridge chip can directly pass through the different CSIX-L1 protocol elements, CFrames, and Dead Cycles.
PAGE 312
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.5 Dual Protocol (SPI and CSIX-L1) Support In many system designs that are less bandwidth-intensive, a single network processor can forward and process data from the framer to the fabric and from the fabric to the framer. A bridge chip must pass data between the network processor and multiple physical devices. The network processor supports multiplexing SPI-4.
PAGE 313
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.5.3 Implementation of a Bridge Chip to CSIX-L1 and SPI-4.2 A bridge chip can provide support for both standard CSIX-L1 and standard physical layer device interfaces such as SPI-3 or UTOPIA Level 3. The bridge chip must implement the functionality of the less trivial CSIX-L1 bridge chip described previously and additionally, implement bridge functionality between SPI-4.2 and the other physical device interfaces.
PAGE 314
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.6 Transmit State Machine Table 114 describes the transmitter state machine by providing guidance in interfacing to the network processor. The state machine is described as three separate state machines for SPI-4.2, training, and CSIX-L1. When each machine is inactive, it tracks the states of the other two state machines. 8.9.6.1 SPI-4.2 Transmitter State Machine The SPI-4.
PAGE 315
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.6.2 Training Transmitter State Machine The Training State Machine makes state transitions on each bus transfer of 16 bits, as described in Table 115. Table 115. Training Transmitter State Machine Transitions on 16-Bit Bus Transfers Current State Training Control Training Data Next State Conditions Training Control Until 10 control cycles. Training Data After 10 control cycles. Training Data Until 10 data cycles.
PAGE 316
Intel® IXP2800 Network Processor Media and Switch Fabric Interface Table 116. CSIX-L1 Transmitter State Machine Transitions on CWord Boundaries (Sheet 2 of 2) Current State Next State Dead Cycle Idle CFrame Conditions Requesting reception of training sequence and no training sequence pending. Training Training sequence pending. SPI Training sequence not pending and SPI data pending and not requesting training sequence. Dead Cycle Always. Tracking Other State Machine States SPI Training 8.9.
PAGE 317
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 8.9.8 Summary of Receiver and Transmitter Signals Figure 117 summarizes the Receiver and Transmitter Signals. Figure 117. Summary of Receiver and Transmitter Signaling RDAT (CSIX:TxData) [15:0] TDAT (CSIX:RxData) [15:0] DDR LVDS SPI-4.2 Data Path and Interface for CSIX Protocol RCTL (CSIX:TxSOF) RCLK (CSIX:TxClk) RPAR (CSIX:TxPar) TCTL (CSIX:RxSOF) TCLK (CSIX:RxClk) TPAR (CSIX:RxPar) RPROT TPROT RSTAT[1:0] LVTTL SPI-4.
PAGE 318
Intel® IXP2800 Network Processor Media and Switch Fabric Interface 318 Hardware Reference Manual
PAGE 319
Intel® IXP2800 Network Processor PCI Unit PCI Unit 9 This section contains information on the IXP2800 Network Processor PCI Unit. 9.1 Overview The PCI Unit allows PCI target transactions to internal registers, SRAM, and DRAM. It also generates PCI initiator transactions from the DMA Engine, Intel XScale® core, and Microengines.
PAGE 320
Intel® IXP2800 Network Processor PCI Unit Figure 118.
PAGE 321
Intel® IXP2800 Network Processor PCI Unit Figure 119.
PAGE 322
Intel® IXP2800 Network Processor PCI Unit If a read address is latched, the subsequent cycles will be retried and no address will be latched until the read completes. The initiator address FIFO can accumulate up to four addresses that can be PCI reads or writes. These FIFOs are inside the PCI Core, which stores data received from the PCI Bus or data to be sent out to the PCI Bus. There are additional buffers implemented in other sub-blocks that buffers data to and from the internal push/pull buses.
PAGE 323
Intel® IXP2800 Network Processor PCI Unit Table 119. PCI Commands (Sheet 2 of 2) Support C_BE_L Command Target Initiator 0xC Memory Read Multiple Aliased as Memory Read except SRAM accesses where the number of Dwords to read is given by the cache line size. Supported 0xD Reserved — — 0xE Memory read line Aliased as Memory Read except SRAM accesses where the number of Dwords to read is given by the cache line size. Supported 0xF Memory Write and Invalidate Aliased as Memory Write.
PAGE 324
Intel® IXP2800 Network Processor PCI Unit 9.2.2.1 Initialization by the Intel XScale® Core The PCI unit is initialized to an inactive, disabled state until the Intel XScale® core has set the Initialize Complete bit in the Control register. This bit is set after the Intel XScale® core has initialized the various PCI base address and mask registers (which should occur within 1 ms of the end of PCI_RESET).
PAGE 325
Intel® IXP2800 Network Processor PCI Unit 9.2.3 PCI Type 0 Configuration Cycles A PCI access to a configuration register occurs when the following conditions are satisfied: • PCI_IDSEL is asserted. (PCI_IDSEL only supports PCI_AD[23:16] bits). • The PCI command is a configuration write or read. • The PCI_AD [1:0] are 00. A configuration register is selected by PCI_AD[7:2]. If the PCI master attempts to do a burst longer than one 32-bit Dword, the PCI unit signals a target disconnect.
PAGE 326
Intel® IXP2800 Network Processor PCI Unit 9.2.5 PCI Target Cycles The following PCI transactions are not supported by the PCI Unit as a target: • • • • • • • 9.2.5.1 IO read or write Type 1 configuration read or write Special cycle IACK cycle PCI Lock cycle Multi-function devices Dual Address cycle PCI Accesses to CSR A PCI access to a CSR occurs if the PCI address matches the CSR base address register (PCI_CSR_BAR).
PAGE 327
Intel® IXP2800 Network Processor PCI Unit 9.2.5.5 Target Read Accesses from the PCI Bus A PCI read occurs if the PCI address matches one of the base address registers and the PCI command is either a Memory Read, Memory Read Line, or Memory Read Multiple. The read is completed as a PCI delayed read. That is, on the first occurrence of the read, the PCI unit signals a retry to the PCI master,.
PAGE 328
Intel® IXP2800 Network Processor PCI Unit never de-asserts it prior to receiving gnt_l[0] or de-asserts it after receiving gnt_l[0] without doing a transaction. PCI Unit de-asserts req_l[0] for two cycles when it receives a retry or disconnect response from the target. 9.2.6.2 PCI Commands The following PCI transactions are not generated by PCI Unit as an initiator: • PCI Lock Cycle • Dual Address cycle • Memory Write and Invalidate 9.2.6.
PAGE 329
Intel® IXP2800 Network Processor PCI Unit 9.2.6.6 Special Cycle As an initiator, special cycles are broadcast to all PCI agents, so DEVSEL_L is not asserted and no error can be received. 9.2.7 PCI Fast Back-to-Back Cycles The core supports fast back-to-back target cycles on the PCI Bus. The core does not generate initiator fast back-to-back cycles on the PCI Bus regardless of the value in the fast back-to-back enable bit of the Status and Command register in the PCI configuration space. 9.2.
PAGE 330
Intel® IXP2800 Network Processor PCI Unit 9.2.11 PCI Central Functions The CFG_RSTDIR pin is active high for enabling the PCI Unit central function. The CFG_PCI_ARB(GPIO[2]) pin is the strap pin for the internal arbiter. When this strap pin is high during reset then the XPI Unit owns the arbitration. The CFG_PCI_BOOT_HOST(GPIO[1]) pin is the strap pin for the PCI host.When PCI_BOOT_HOST is asserted during reset then PCI Unit will support as a PCI host. Table 122.
PAGE 331
Intel® IXP2800 Network Processor PCI Unit 9.2.11.3 PCI Internal Arbiter The PCI unit contains a PCI bus arbiter that supports two external masters in addition to the PCI Unit’s initiator interface. To enable the PCI arbiter, the CFG_PCI_ARB(GPIO[2]) strapping pin must be 1 during reset. As shown in Figure 120, the local bus request and grant pair become externally not visible. These signals will be made available to external debug pins for debug purpose. Figure 120.
PAGE 332
Intel® IXP2800 Network Processor PCI Unit 9.3 Slave Interface Block The slave interface logic supports internal slave devices interfacing to the target port of the FBus. • CSR — register access cycles to local CSRs. • DRAM — memory access cycles to the DRAM push/pull Bus. • SRAM — memory access cycles to the SRAM push/pull Bus. The slave port of the FBus is connected to a 64-byte write buffer to support bursts of up to 64 bytes to the memory interfaces.
PAGE 333
Intel® IXP2800 Network Processor PCI Unit 9.3.2 SRAM Interface The SRAM interface connects the FBus to the internal push/pull command bus and the SRAM push/pull data buses. Request to memory is sent on the command bus. Data request is received as valid push/pull ID sent by the SRAM push/pull data bus. If the PCI_SRAM_BAR is used, the target state machine generates a request to the command bus for SRAM access.
PAGE 334
Intel® IXP2800 Network Processor PCI Unit 9.3.2.2 SRAM Slave Reads For a slave read from SRAM, a 32-bit DWORD is fetched from the memory for memory read command, one cache line is fetched for memory read line command, and two cache lines are read for memory read multiple command. Cache line size is programmable in the CACHE_LINE field of the PCI_CACHE_LAT_HDR_BIST configuration register. If the computed read size is greater than 64 bytes, the PCI SRAM read will default to the maximum of 64 bytes.
PAGE 335
Intel® IXP2800 Network Processor PCI Unit Figure 122.
PAGE 336
Intel® IXP2800 Network Processor PCI Unit Note: The IXP2800/IXP2850 always disconnects after transferring 16-bytes for DRAM target reads. The PCI core will also disconnect at a 64-byte address boundary. Figure 123.
PAGE 337
Intel® IXP2800 Network Processor PCI Unit The doorbell interrupts are controlled through the registers shown in Table 124. Table 124. Doorbell Interrupt Registers Register Name Description Intel XScale® core Doorbell Used to generate the Intel XScale® core Doorbell interrupts. Intel XScale® core Doorbell Setup Used to initialize the Intel XScale® core Doorbell register and for diagnostics. PCI Doorbell Used to generate the PCI Doorbell interrupts.
PAGE 338
Intel® IXP2800 Network Processor PCI Unit Figure 125. Generation of the Doorbell Interrupts to the Intel XScale® Core FIQ or IRQ 2. Intel XScale® Core Reads XSCALE_DOORBELL to determine the Doorbell interrupt (e.g.; reads 0x0030 F2F1). Q R Intel XScale® Core DOORBELL Register D S 1. PCI device write 1 to clear bit and generate a FIZ/IRQ. 3. Intel XScale® Core inverts the read value and write back the results to clear interrupt (e.g., write 0x0030 F2F1 ^ 0xFFFF FFFF = 0xFFCF 0C0E).
PAGE 339
Intel® IXP2800 Network Processor PCI Unit 9.3.5 PCI Interrupt Pin An external PCI interrupt can be generated in the following way: The Intel XScale® core initiates a Doorbell interrupt XSCALE_INT_ENABLE. • • • • One or more of the DMA channels have completed the DMA transfers. The PNI bit is cleared by the Intel XScale® core to generate a PCI interrupt An internal functional unit generates either an interrupt or an error directly to the PCI host.
PAGE 340
Intel® IXP2800 Network Processor PCI Unit 9.4 Master Interface Block The Master Interface consists of the DMA engine and the Push/pull target interface. Both can generate initiator PCI transactions. 9.4.1 DMA Interface There are two DMA channels, each of which can move blocks of data from DRAM to the PCI or from the PCI to DRAM. The DMA channels read parameters from a list of descriptors in SRAM, perform the data movement to or from DRAM, and stop when the list is exhausted.
PAGE 341
Intel® IXP2800 Network Processor PCI Unit 9.4.1.1 Allocation of the DMA Channels Static allocation are employed such that the DMA resources are controlled exclusively by a single device for each channel. The Intel XScale® core, a Microengine and the external PCI host can access the two DMA channels. The first two channels can function in one of the following modes, as determined by the DMA_INF_MODE register: • • • • The Intel XScale® core owns both DMA channel 1 and channel 2.
PAGE 342
Intel® IXP2800 Network Processor PCI Unit 9.4.1.3 DMA Descriptor Each descriptor occupies four 32-bit Dwords and is aligned on a 16-byte boundary. The DMA channels read the descriptors from local SRAM into the four DMA working registers once the control register has been set to initiate the transaction. This control must be set explicitly. This starts the DMA transfer. The register names for the DMA channels are listed in Figure 127. Figure 127.
PAGE 343
Intel® IXP2800 Network Processor PCI Unit 9.4.1.4 DMA Channel Operation Since a PCI device, Microengine, or the Intel XScale® core can access the internal CSRs and memory in a similar way, the DMA channel operation description that follows will apply to all channels. CHAN_1_, CHAN_2_, or CHAN_3_ can be placed before the name for the DMA registers. The DMA channel owner can either set up the descriptors in SRAM or it can write the first descriptor directly to the DMA channel registers.
PAGE 344
Intel® IXP2800 Network Processor PCI Unit 9.4.1.5 DMA Channel End Operation 1. Channel owned by PCI: If not masked via the PCI Outbound Interrupt Mask register, the DMA channel interrupts the PCI host after the setting of the DMA done bit in the CHAN_X_CONTROL register, which is readable in the PCI Outbound Interrupt Status register. 2.
PAGE 345
Intel® IXP2800 Network Processor PCI Unit A 64-bit double Dword with byte enables is pushed into the FBus FIFO from the DMA buffers as soon as there is data available in the buffer and there is space in the FBus FIFO. The Core logic will transfer the exact number of bytes to the PCI Bus. The maximum burst size on the PCI bus varies according to the stepping and is described in Table 127 Table 127. PCI Maximum Burst Size Stepping A Stepping Description The maximum burst size is 64 bytes.
PAGE 346
Intel® IXP2800 Network Processor PCI Unit 9.4.2.2 Command Bus Master Access to Local Control and Status Registers These are CSRs within the PCI Unit that are accessible from push/pull bus masters. The masters include the Intel XScale® core, Microengines. There is no PCI bus cycles generated. The CSRs within the PCI Unit can be accessed internally by external PCI devices. 9.4.2.
PAGE 347
Intel® IXP2800 Network Processor PCI Unit 9.4.2.3.2 PCI Address Generation for Configuration Cycles When a push/pull command bus master is accessing the PCI Bus to generate a configuration cycle, the PCI address is generated based on the a Command Bus Master address as shown in Table 128 and Figure 129: Table 128.
PAGE 348
Intel® IXP2800 Network Processor PCI Unit 9.5 PCI Unit Error Behavior 9.5.1 PCI Target Error Behavior 9.5.1.1 Target Access Has an Address Parity Error 1. If PCI_CMD_STAT[PERR_RESP] is not set, PCI Unit will ignore the parity error. 2. If PCI_CMD_STAT[PERR_RESP] is set: a. PCI core will not claim the cycle regardless of internal device select signal. b. PCI core will let the cycle terminate with master abort. c. PCI core will not assert PCI_SERR_L. d.
PAGE 349
Intel® IXP2800 Network Processor PCI Unit 9.5.1.5 Target Write Access Receives Bad Parity PCI_PAR with the Data 1. If PCI_CMD_STAT[PERR_RESP] is not set, PCI Unit will ignore the parity error. 2. If PCI_CMD_STAT[PERR_RESP] is set: a. core asserts PCI_PERR_L and sets PCI_CMD_STAT[PERR]. b. Slave Interface sets PCI_CONTROL[TGT_WR_PAR], which will interrupt the Intel XScale® core if enabled. c. Data is discarded. 9.5.1.6 SRAM Responds with a Memory Error on One or More Data Phases on a Target Read 1.
PAGE 350
Intel® IXP2800 Network Processor PCI Unit 9.5.2.2 DMA Read from SRAM (Descriptor Read) Gets a Memory Error 1. Set PCI_CONTROL[DMA_SRAM_ERR] which will interrupt the Intel XScale® core if enabled. 2. Master Interface clears the Channel Enable bit in CHAN_X_CONTROL. 3. Master Interface sets DMA channel error bit in CHAN_X_CONTROL. 4. Master Interface does not reset the DMA CSRs; This leaves the descriptor pointer pointing to the DMA descriptor of the failed transfer. 5.
PAGE 351
Intel® IXP2800 Network Processor PCI Unit 9.5.2.5 Note: DMA Transfer Experiences a Master Abort (Time-Out) on PCI That is, nobody asserts DEVSEL during the DEVSEL window. 1. Master Interface sets PCI_CONTROL[RMA] which will interrupt the Intel XScale® core if enabled. 2. Master Interface clears the Channel Enable bit in CHAN_X_CONTROL. 3. Master Interface sets DMA channel error bit in CHAN_X_CONTROL. 4.
PAGE 352
Intel® IXP2800 Network Processor PCI Unit 9.5.3.3 Master from the Intel XScale® Core or Microengine Transfer (Write to PCI) Receives PCI_PERR_L on PCI Bus 1. If PCI_CMD_STAT[PERR_RESP] is not set, PCI Unit will ignore the parity error. 2. If PCI_CMD_STAT[PERR_RESP] is set: a. Core sets PCI_CMD_STAT[PERR]. b. Master Interface sets PCI_CONTROL[DPE] which will interrupt the Intel XScale® core if enabled. 9.5.3.4 Master Read from PCI (Read from PCI) Has Bad Data Parity 1.
PAGE 353
Intel® IXP2800 Network Processor PCI Unit -- Table 130.
PAGE 354
Intel® IXP2800 Network Processor PCI Unit Table 134.
PAGE 355
Intel® IXP2800 Network Processor PCI Unit The BE_DEMI bit of the PCI_CONTROL register can be set to enable big-endian on the incoming data from the PCI Bus to both the SRAM and DRAM. The BE_DEMO bit of the PCI_CONTROL register can be set to enable big-endian on the outgoing data to the PCI Bus from both the SRAM and DRAM. 9.6.1 Endian for Byte Enable During any endian conversion, PCI does not need to do any longword byte enable swapping between two 32-bit longwords (LW1, LW0).
PAGE 356
Intel® IXP2800 Network Processor PCI Unit Table 141.
PAGE 357
Intel® IXP2800 Network Processor PCI Unit Table 145.
PAGE 358
Intel® IXP2800 Network Processor PCI Unit Table 146. PCI I/O Cycles with Data Swap Enable Stepping Description A Stepping A PCI IO cycle is treated like CSR where the data bytes are not swapped. It is sent in the same byte order whether the PCI bus is configured in Big-Endian or Little-Endian mode. When PCI_CONTROL[IEE] is 0, PCI data is sent in the same byte order whether the PCI bus is configured in Big-Endian or Little-Endian mode.
PAGE 359
Intel® IXP2800 Network Processor Clocks and Reset Clocks and Reset 10 This section describes the IXP2800 Network Processor clocks and reset. Refer to the Intel® IXP2800 Network Processor Hardware Initialization Reference Manual for information about the initialization of all units of the IXP2800 Network Processor. 10.
PAGE 360
Intel® IXP2800 Network Processor Clocks and Reset Figure 130. Overall Clock Generation and Distribution Slow Port Devices, i.e., Flash, ROM Slow Port Control S_clk1 S_clk2 S_clk3 SRAM0 SRAM1 SRAM2 SRAM3 Media and Switch Fabric Interface Scratch, Hash, CSR Intel XScale® Core External Oscillator S_clk0 tdclk rdclk tclk_ref Gasket ref_clk_l Clock Unit with PLL ref_clk_h Constant (Multiplier) Peripherals (Timers, UART, etc.
PAGE 361
Intel® IXP2800 Network Processor Clocks and Reset Table 147. Clock Usage Summary (Sheet 2 of 2) Unit Name SRAM Scratch, Hash, CSR Description Comment SRAM pins and control logic (all of the SRAM unit except Internal Bus interface). Divide of Microengine frequency. Each SRAM channel has its own frequency selection. Clocks are driven by the IXP2800 Network Processor to external SRAMs and/or Coprocessors. Scratch RAM, Hash Unit, CSR access block 1/2 of Microengine frequency.
PAGE 362
Intel® IXP2800 Network Processor Clocks and Reset Table 148.
PAGE 363
Intel® IXP2800 Network Processor Clocks and Reset Figure 131. IXP2800 Network Processor Clock Generation PLL Divide by 4 Bypass Clk Divide by 2 Internal Buses (CPP), Intel XScale® Core ME DFT TBD Divide by N (reset value: 15) DRAMs Divide by N (reset value: 15) SRAM0 Divide by N (reset value: 15) SRAM1 Divide by N (reset value: 15) SRAM2 Divide by N (reset value: 15) SRAM3 Divide by N (reset value: 15) MEDIA Divide by Nx4 (reset value: 15) APB A9778-04 10.
PAGE 364
Intel® IXP2800 Network Processor Clocks and Reset Figure 132. Synchronization Between Frequency Domains Clock A domain Delay Element Clock B domain Data_out Data_in Clock A Clock B Clock A and Clock B are guaranteed to be at least two PLL clocks apart; therefore, if the delay element is such that it is more than the hold time required by clock B but less than the setup required by Clock B, data should transfer glitch-free from the Clock A to Clock B domain. 10.
PAGE 365
Intel® IXP2800 Network Processor Clocks and Reset “reset_out_strap” is sampled as 0 on the trailing edge of reset, nRESET_OUT is de-asserted based on the value of IXP_RESET_0[15] which is written by software. If “reset_out_strap” is sampled as 1 on the trailing edge of reset, nRESET_OUT is de-asserted after PLL locks. During normal function mode, if software wants to pull nRESET_OUT high, it should set IXP_RESET_0[22] = 1 and then set IXP_RESET_0[15] = 1.
PAGE 366
Intel® IXP2800 Network Processor Clocks and Reset Figure 134. Reset Generation Watchdog History Register (WHR) Watchdog Event D SOFTWARE RESET Reset nRESET# PLL_RST PLL CORE_RST Logic PCI_RST# CFG_PCI_RST_DIR (1: Output, 0:Input) Counter to guarantee minimum assertion time WATCHDOG_RESET Notes: When Watchdog event happens the register gets set. This register gets reset when WHR_Reset gets asserted or software reads it. A9781-01 10.3.
PAGE 367
Intel® IXP2800 Network Processor Clocks and Reset 10.3.3.1 Slave Network Processor (Non-Central Function) • If the Watchdog timer reset enable bit set to 1, Watchdog reset will trigger the soft reset • If the Watchdog timer reset enable bit set to 0, Watchdog reset will trigger the PCI interrupt to external PCI host (if interrupt is enabled by PCI Outbound Interrupt Mask Register[3]).
PAGE 368
Intel® IXP2800 Network Processor Clocks and Reset Once in operation, if the watchdog timer expires with watchdog timer enable bit WDE from Timer Watchdog Enable register set, a reset pulse from the watchdog timer logic goes to PLL unit after passing through a counter to guarantee minimum assertion time, which in turn resets the IXP_RESETn registers that cause the entire chip to be reset. Figure 134 explains the reset generation for the PLL logic and for the rest of the core.
PAGE 369
Intel® IXP2800 Network Processor Clocks and Reset Table 149. IXP2800 Network Processor Strap Pins Signal Name Description PCI_RST direction pin: (Also called PCI_HOST) Need to be a dedicated pin. CFG_RST_DIR RST_DIR 1—IXP2800 Network Processor is the host supporting central function. PCI_RST_L is output. 0—IXP2800 Network Processor is not central function. PCI_RST_L is input. This pin is stored at XSC[31] (XScale_Control register) at the trailing edge of reset.
PAGE 370
Intel® IXP2800 Network Processor Clocks and Reset Table 150 lists the supported Strap combinations of CFG_PROM_BOOT, CFG_RST_DIR, and CFG_PCI_BOOT_HIST. Table 150.
PAGE 371
Intel® IXP2800 Network Processor Clocks and Reset Figure 135. Boot Process START Reset Signal asserted (hardware, software, PCI or Watchdog) Reset Signal deasserted. If CFG_RST_DIR is 1, the Network Processor drives PCI RST# signal. If CFG_RST_DIR is 0, PCI_RST# is input. No CFG_PROM_BOOTBoot From Present 1. Intel XScale® Core is held in reset. 2. PCI BAR window sizes are configured by strap options. 3. External PCI host configures PCI registers and DRAM registers. 4.
PAGE 372
Intel® IXP2800 Network Processor Clocks and Reset 10.4.1 Flash ROM At power up, if FLASH_ROM is present, strap pin CFG_PROM_BOOT should be sampled 1 (should be pulled up). Therefore after reset being removed by the PLL logic from the IXP_RESET0 register, the Intel XScale® core reset is automatically removed.
PAGE 373
Intel® IXP2800 Network Processor Clocks and Reset code is written in DRAM, PCI host writes 1 at bit [8] of Misc_Control register called Flash Alias Disable (Reset value 0). The Alias Disable bit can be wired to the Intel XScale® core gasket directly so that gasket knows how to transform address 0 from the Intel XScale® core. After writing 1 at Flash Alias Disable bit, host removes reset from the Intel XScale® core by writing 0 in bit [0] of IXP_RESET0 register.
PAGE 374
Intel® IXP2800 Network Processor Clocks and Reset 374 Hardware Reference Manual
PAGE 375
Intel® IXP2800 Network Processor Performance Monitor Unit Performance Monitor Unit 11.1 11 Introduction The Performance Monitor Unit (PMU) is a hardware block consisting of counters and comparators that can be programmed and controlled by using a set of configured registers to monitor and to fine tune performance of different hardware units in the IXP2800 Network Processor.
PAGE 376
Intel® IXP2800 Network Processor Performance Monitor Unit Figure 136.
PAGE 377
Intel® IXP2800 Network Processor Performance Monitor Unit 11.1.3 Functional Overview of CHAP Counters At the heart of the CHAP counter’s functionality are counters, each with associated registers. Each counter has a corresponding command, event, status, and data register. The smallest implementation has two counters, but if justified for a particular product, this architecture can support many more counters. The primary consideration is available silicon area.
PAGE 378
Intel® IXP2800 Network Processor Performance Monitor Unit Figure 137. Block Diagram of a Single CHAP Counter 11.1.4 Command Trigger Data Register Increment Event Counter Decrement Event Signals from Internal Units Command, Status, & Event Registers & control logic >=< Event Preconditioning External Input Event 32b Register Access Bus Basic Operation of the Performance Monitor Unit At power-up, the Intel XScale® core invokes the performance monitoring software code.
PAGE 379
Intel® IXP2800 Network Processor Performance Monitor Unit Figure 138. Basic Block Diagram of IXP2800 Network Processor with PMU Config Registers MEDIA I/F PCI I/F APB Bus SHaC ME1 ME2 ME3 ME4 ME8 CHAP Counters Intel XScale® Core Push-pull Bus QDR Control DDRAM Control MUX Control from PMU Hardware Events PMU 11.1.5 Definition of CHAP Terminology Duration Count The counter is incremented for each clock for which the event signal is asserted as logic high. MMR Memory Mapped register.
PAGE 380
Intel® IXP2800 Network Processor Performance Monitor Unit 11.1.6 Definition of Clock Domains The following abbreviations are used in the events table under clock domain. The Command Push/Pull Clock also known as the Chassis clock. This clock is derived from the Microengine (ME) Clock. It is one-half of the Microengine clock. P_CLK 11.2 T_CLK Microengine Clock. MTS_CLK MSF Flow Control Status LVTTL Clock TS_CLK. MRX_CLK MSF Flow Control Receive LVDS Clock RX_CLK.
PAGE 381
Intel® IXP2800 Network Processor Performance Monitor Unit Table 151. APB Usage Accessing Read Operation Access Method: • Microengine: csr[read] APB Peripheral 11.2.
PAGE 382
Intel® IXP2800 Network Processor Performance Monitor Unit acknowledge signal (CAP_CSR_RD_RDY). When the data is returned, CAP puts the read data into the Push Data FIFO, arbitrates for the S_Push_Bus, and then the Push/Pull Arbiter pushes the data to the destination identified in PP_ID. 11.2.
PAGE 383
Intel® IXP2800 Network Processor Performance Monitor Unit Table 152. Hardware Blocks and Their Performance Measurement Events (Sheet 1 of 2) Hardware Block Performance Measurement Event Description Intel XScale® Core DRAM Read Head of Queue Latency Histogram The Intel XScale® core generates a read or write command to the DRAM primarily to either push or pull data of the DDRAM. These commands are scheduled to the DRAM through the push-pull arbiter through a command FIFO in the gasket.
PAGE 384
Intel® IXP2800 Network Processor Performance Monitor Unit Table 152. Hardware Blocks and Their Performance Measurement Events (Sheet 2 of 2) Hardware Block Performance Measurement Event Description Chassis/Push-Pull Command Bus Utilization These statistics give the number of the command requests issued by the different Masters in a particular period of time. This measurement also indicates how long it takes to issue the grant from the request being issued by the different Masters.
PAGE 385
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4 Events Monitored in Hardware Tables in this section describe the events that can be measured, including the name of the event and the Event Selection Code (ESC). Refer to Section 11.4 for tables showing event selection codes. The acronyms in the event names typically represent unit names.The guidelines for which events a particular component must implement are provided in the following sections. 11.4.1 Queue Statistics Events 11.4.1.
PAGE 386
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.3 Design Block Select Definitions Once an event is defined, its definition must remain consistent between products. If the definition changes, it should have a new event selection code. This document contains the master list of all ESCs in all CHAP-enabled products. Not all of the ESCs in this document are listed in numerical order. The recommendation is to group similar events within the following ESC ranges. See Table 153. Table 153.
PAGE 387
Intel® IXP2800 Network Processor Performance Monitor Unit Table 153.
PAGE 388
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.5 Threshold Events These are the outputs of the threshold comparators. When the value in a data register is compared to its corresponding counter value and the condition is true, a threshold event is generated. This results in: • A pulse on the signal lines that are routed to the event’s input port (one signal line from each comparator).
PAGE 389
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6 External Input Events 11.4.6.1 XPI Events Target ID(000001) / Design Block #(0100) Table 155. XPI PMU Event List (Sheet 1 of 4) Event Number Event Name Clock Domain Single pulse/ Long pulse Burst Description 0 XPI_RD_P APB_CLK single separate It includes all the read accesses, PMU, timer, GPIO, UART, and Slowport. 1 XPI_WR_P APB_CLK single separate It includes all the write accesses, PMU, timer, GPIO, UART, and Slowport.
PAGE 390
Intel® IXP2800 Network Processor Performance Monitor Unit Table 155. XPI PMU Event List (Sheet 2 of 4) 390 26 TURNA0_C_P APB_CLK single separate It enters the termination state of the state machine 0 for the mode 0 of Slowport. 27 IDLE1_0_P APB_CLK single separate It displays the idle state of the state machine 1 for the mode 1 of Slowport. 28 START1_1_P APB_CLK single separate It enters the start state of the state machine 1 for the mode 1 of Slowport.
PAGE 391
Intel® IXP2800 Network Processor Performance Monitor Unit Table 155. XPI PMU Event List (Sheet 3 of 4) 48 SETUP2_4_P APB_CLK single separate It enters the pulse width of the data transaction cycle for the state machine 2 for the mode 2 of Slowport. 49 PULW2_C_P APB_CLK single separate It enters the pulse width of the data transaction cycle for the state machine 2 for the mode 2 of Slowport.
PAGE 392
Intel® IXP2800 Network Processor Performance Monitor Unit Table 155. XPI PMU Event List (Sheet 4 of 4) 392 70 TURNA3_8_P APB_CLK single separate It enters the turnaround state of the transaction when the state machine 3 is active for the mode 3 of Slowport. 71 IDLE4_0_P APB_CLK single separate It displays the idle state of the state machine 4 for the mode 4 of Slowport. 72 START4_1_P APB_CLK single separate It enters the start state of the state machine 4 for the mode 4 of Slowport.
PAGE 393
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.2 SHaC Events Target ID(000010) / Design Block #(0101) Table 156.
PAGE 394
Intel® IXP2800 Network Processor Performance Monitor Unit Table 156.
PAGE 395
Intel® IXP2800 Network Processor Performance Monitor Unit Table 156. SHaC PMU Event List (Sheet 3 of 4) 35 Scratch Ring_14 Status P_CLK single separate If SCRATCH_RING_BASE_x[26] = 1, RING_14_STATUS indicates empty. If SCRATCH_RING_BASE_x[26] = 0, RING_14_STATUS indicates full. If SCRATCH_RING_BASE_x[26] = 1, RING_15_STATUS indicates empty.
PAGE 396
Intel® IXP2800 Network Processor Performance Monitor Unit Table 156. SHaC PMU Event List (Sheet 4 of 4) 63 Hash Cmd_Pipe Full P_CLK single separate Hash Command Pipe Full 64 Hash Push_Data_Pipe Not_Empty P_CLK single separate Hash Push Data Pipe Not Empty 65 Hash Push_Data_Pipe Full P_CLK single separate Hash Push Data Pipe Full 11.4.6.3 IXP2800 Network Processor MSF Events Target ID(000011) / Design Block #(0110) Table 157.
PAGE 397
Intel® IXP2800 Network Processor Performance Monitor Unit Table 157.
PAGE 398
Intel® IXP2800 Network Processor Performance Monitor Unit Table 157. IXP2800 Network Processor MSF PMU Event List (Sheet 3 of 6) 45 46 47 Detect FC_DEAD Detect C_IDLE Detect C_DEAD MRX_CLK MR_CLK MR_CLK level level level separate Indicates that a dead cycle has been received on the RXCDAT inputs for greater than 2 clock cycles; the valid signal from the MTS_CLK domain is synchronized; as such, it yields an approximate value.
PAGE 399
Intel® IXP2800 Network Processor Performance Monitor Unit Table 157. IXP2800 Network Processor MSF PMU Event List (Sheet 4 of 6) 70 SPI-4 Packet received 71 reserved P_CLK pulse separate Indicates that the SPI-4 state machine after the Receive input FIFO has received an SPI-4 packet.
PAGE 400
Intel® IXP2800 Network Processor Performance Monitor Unit Table 157. IXP2800 Network Processor MSF PMU Event List (Sheet 5 of 6) 97 Rx null autopush P_CLK pulse separate 98 Tx skip P_CLK pulse separate An mpacket was dropped due to the Tx_Skip bit being set in the Transmit Control Word. 99 SF_CRDY P_CLK level separate Only valid in CSIX receive mode and indicates how much of the time the switch fabric is able to receive control CFrames.
PAGE 401
Intel® IXP2800 Network Processor Performance Monitor Unit Table 157.
PAGE 402
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.4 Intel XScale® Core Events Target ID(000100) / Design Block #(0111) Table 158.
PAGE 403
Intel® IXP2800 Network Processor Performance Monitor Unit Table 158.
PAGE 404
Intel® IXP2800 Network Processor Performance Monitor Unit Table 158.
PAGE 405
Intel® IXP2800 Network Processor Performance Monitor Unit Table 158.
PAGE 406
Intel® IXP2800 Network Processor Performance Monitor Unit Table 159.
PAGE 407
Intel® IXP2800 Network Processor Performance Monitor Unit Table 159.
PAGE 408
Intel® IXP2800 Network Processor Performance Monitor Unit Table 159.
PAGE 409
Intel® IXP2800 Network Processor Performance Monitor Unit Table 159.
PAGE 410
Intel® IXP2800 Network Processor Performance Monitor Unit Table 160. ME00 PMU Event List (Sheet 2 of 2) 12 ME_FIFO_DEQ P_CLK single separate Command FIFO Dequeue 13 ME_FIFO_NOT_EMPTY P_CLK single separate Command FIFO not empty Note: 1. All the Microengine have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 411
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.8 ME02 Events Target ID(100010) / Design Block #(1001) Table 162. ME02 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the Microengines have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 412
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.10 ME04 Events Target ID(100100) / Design Block #(1001) Table 164. ME04 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the Microengines have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 413
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.12 ME06 Events Target ID(100110) / Design Block #(1001) Table 166. ME06 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the Microengines have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 414
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.14 ME10 Events Target ID(110000) / Design Block #(1010) Table 168. ME10 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the Microengines have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 415
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.16 ME12 Events Target ID(110010) / Design Block #(1010) Table 170. ME12 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the Microengines have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 416
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.18 ME14 Events Target ID(110100) / Design Block #(1010) Table 172. ME14 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the Microengines have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 417
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.20 ME16 Events Target ID(100110) / Design Block #(1010) Table 174. ME16 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the Microengines have the same event list. 2. CC_Enable bit[2:0] is PMU_CTX_Monitor in Microengine CSR, This field holds the number of context to be monitored. The event count only reflects the events that occur when this context is executing.
PAGE 418
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.22 SRAM DP1 Events Target ID(001001) / Design Block #(0010) Table 176. SRAM DP1 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. SRAM DP1/DP0 push/pull arbiter has same event lists. 2. S_CLK = SRAM clock domain 3.
PAGE 419
Intel® IXP2800 Network Processor Performance Monitor Unit Table 177.
PAGE 420
Intel® IXP2800 Network Processor Performance Monitor Unit Table 177.
PAGE 421
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.25 SRAM CH2 Events Target ID(001100) / Design Block #(0010) Table 179. SRAM CH3 PMU Event List Event Number Event Name Clock Domain Pulse/ Level Burst Description Note: 1. All the SRAM Channel has same event lists. 2. S_CLK = SRAM clock domain 3.
PAGE 422
Intel® IXP2800 Network Processor Performance Monitor Unit 11.4.6.27 SRAM CH0 Events Target ID(001110) / Design Block #(0010) Table 181.
PAGE 423
Intel® IXP2800 Network Processor Performance Monitor Unit Table 181.
PAGE 424
Intel® IXP2800 Network Processor Performance Monitor Unit Table 182.
PAGE 425
Intel® IXP2800 Network Processor Performance Monitor Unit Table 183. IXP2800 Network Processor Dram DPSA PMU Event List (Sheet 2 of 2) 17 cr1_deq_id_wph 18 dram_req_rph[4] P_CLK single separate cr1 has a valid req 19 next_cr1_full_wph P_CLK single separate cr1 FIFO hit the full threshold 11.4.6.30 P_CLK single separate Dequeue cr1 cmd/data IXP2800 Network Processor DRAM CH2 Events Target ID(010100) / Design Block #(0011) Table 184.
PAGE 426
Intel® IXP2800 Network Processor Performance Monitor Unit Table 184. IXP2800 Network Processor Dram CH2 PMU Event List (Sheet 2 of 5) 14 deq_push_ctrl_wph P_CLK single separate Active when dequeueing from the push control FIFO; occurs on the last cycle of a burst or on the only cycle of a single transfer. 15 d_push_ctrl_fsm/ single_xfer_wph P_CLK single separate Active if the data is about to be transferred from d_push data FIFO to the dp_unit FIFO is length 0, i.e., a single 8-byte transfer.
PAGE 427
Intel® IXP2800 Network Processor Performance Monitor Unit Table 184. IXP2800 Network Processor Dram CH2 PMU Event List (Sheet 3 of 5) 33 DAP_DEQ_B3_DATA_RPH P_CLK single separate 34 DAP_DEQ_B2_DATA_RPH P_CLK single separate 35 DAP_DEQ_B1_DATA_RPH P_CLK single separate Indicates pull data and command are being dequeued from the data and command bank FIFOs to the RMC (the command and data FIFOs used in tandem for pulls to supply the address and data respectively).
PAGE 428
Intel® IXP2800 Network Processor Performance Monitor Unit Table 184. IXP2800 Network Processor Dram CH2 PMU Event List (Sheet 4 of 5) 57 reserved 58 reserved 59 deq_split_cmd_fifo_wph P_CLK single separate Active when dequeueing from the split inlet FIFO. 60 deq_inlet_fifo1_wph P_CLK single separate Active when dequeueing from the inlet FIFO. 61 deq_inlet_fifo_wph P_CLK single separate Active when dequeueing from either the inlet or split-inlet FIFO.
PAGE 429
Intel® IXP2800 Network Processor Performance Monitor Unit Table 184. IXP2800 Network Processor Dram CH2 PMU Event List (Sheet 5 of 5) 80 bank2_enq_wph P_CLK single separate Indicates this channel is enqueueing a DRAM command for bank2. 81 bank1_enq_wph P_CLK single separate Indicates this channel is enqueueing a DRAM command for bank1. 82 bank0_enq_wph P_CLK single separate Indicates this channel is enqueueing a DRAM command for bank0.
PAGE 430
Intel® IXP2800 Network Processor Performance Monitor Unit 430 Hardware Reference Manual