TMS320C6457 Fixed-Point Digital Signal Processor Silicon Revisions 1.0, 1.1, 1.2, 1.3, 1.
www.ti.com 2 TMS320C6457 Fixed-Point Digital Signal Processor Silicon Errata Silicon Revisions 1.0, 1.1, 1.2, 1.3, 1.
www.ti.com Contents Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Device and Development Support Tool Nomenclature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Package Symbolization and Revision Identification . . . . . . . . . . . . . . . . . . . . . . . . . .
www.ti.com List of Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Lot Trace Code Examples for TMS320C6457 (CMH and GMH Packages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cache Line Operations Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 Sequence of Events. . . . . . . . . . . . . . . . . . . . . .
Silicon Errata SPRZ293A—November 2009 TMS320C6457 Fixed-Point Digital Signal Processor Silicon Revisions 1.0, 1.1, 1.2, 1.3, 1.4 Introduction This document describes the silicon updates to the functional specifications for the TMS320C6457 fixed point digital signal processor. See the device-specific data manual, TMS320C6457 Fixed PointDigital Signal Processor data manual (literature number SPRS582) for more information. Note—TMS320C6457 Silicon Revision 1.1 was a manufacturing process change.
Device and Development Support Tool Nomenclature www.ti.com TMS devices and TMDS development-support tools have been characterized fully, and the quality and reliability of the device have been demonstrated fully. TI's standard warranty applies. Predictions show that prototype devices (TMX or TMP) have a greater failure rate than the standard production devices.
Device and Development Support Tool Nomenclature www.ti.com The Megamodule Revision ID register (MM_REVID) is a read-only register that identifies to the customer the revision of the C64x+ Megamodule. The value in the VERSION field of the Megamodule Revision ID Register changes based on the version of the C64x+ Megamodule implemented on the device.
Silicon Updates www.ti.com Silicon Updates Table 3 lists the silicon updates applicable to each silicon revision. For details on each advisory, click on the link below. Table 3 Silicon Revisions 1.4, 1.3, 1.2, 1.1, and 1.0 Updates Applies To Silicon Revision Silicon Update Advisory See 1.4 1.3 1.2 1.1 1.
Silicon Updates www.ti.com Advisory 1 EMAC Boot Issue Revision(s) Affected: Details: 1.1, 1.0 The EMAC ready announcement frame is not transmitted when the C6457 device is booted in master and slave modes. When the DSP is booted in EMAC master/slave boot modes (boot modes 4, 5), the DSP transmits an Ethernet Ready Announcement (ERA) frame in the form of a BOOTP request. The BOOTP request is intended to inform the host server that the DSP is ready to receive boot packets.
Silicon Updates www.ti.com Advisory 2 EDMA3CC COMPACTV Issue Revision(s) Affected: Details: 1.1, 1.0. A bug has been found inside the EDMA3 channel controller (EDMA3CC). The logic for decrementing the completion request active (COMPACTV) counter is incorrect for devices having six or more EDMA3 transfer controllers (EDMA3TCs). Therefore, the C6457 device is affected by this bug.
Silicon Updates www.ti.com Upon polling, if the value of the COMPACTV field is greater than a certain threshold (0x20 is suggested), then the DSP should program the TC with a COMPACTV decrement transfer. Upon completion of that transfer (as signaled in the CC IPR register) the COMPACTV field should be re-checked, and another COMPACTV decrement transfer submitted until the value of the counter is less than the threshold.
Silicon Updates www.ti.com Advisory 3 SRIO Port0 Reset Issue Revision(s) Affected: Details: Workaround 1: 12 1.3, 1.2, 1.1, 1.0 The SERDES macro for SRIO should allow reset of individual 1× ports without affecting the state of the other operational ports. There are dedicated MMR bits to reset 1× ports, which are the BLKn_EN (n=5..8) at offsets 0x60, 0x68, 0x70 and 0x78 for C6457. However, the BLK5_EN, which controls reset for port0, also resets all other ports.
Silicon Updates www.ti.com Advisory 4 SRIO Outbound ACKID Issue Revision(s) Affected: Details: Workaround 1: SPRZ293A—November 2009 Submit Documentation Feedback 1.3, 1.2, 1.1, 1.0 The OUTBOUND_ACKID field of the RIO_SP(n)_ACKID_STAT register should be updated by hardware each time a packet is sent out. The value should reflect the ACKID value to be used on the next transmit packet. This field is being updated by hardware as expected.
Silicon Updates www.ti.com Advisory 5 SRIO Bootloader Issue Revision(s) Affected: 14 1.1, 1.0 Details: Silicon revisions 1.0 and 1.1 of the C6457 device use the v1.5 bootloader. In SRIO boot mode, when 4× mode falls back to 1× mode in certain hardware configurations, the boot does not operate correctly.
Silicon Updates www.ti.com Advisory 6 DMA Access to L2 SRAM May Stall When the DMA and the CPU Command Priority is Equal Revision(s) Affected: Details: 1.3, 1.2, 1.1, 1.0 The L2 memory controller in the C64x+ Megamodule has programmable bandwidth management features that are used to control bandwidth allocation for all requestors. There are two parameters to control this feature: command priority and arbitration counter MAXWAIT values.
Silicon Updates www.ti.com busy every cycle. Hence, the DMAs stall until the stream of CPU accesses completes. For example, if a continuous stream of L1D write misses to L2 keep the L2 memory controller busy every cycle, the DMAs stall for the entire duration of the write miss stream. Note—When the SDMA has finished sending all of its commands to the L2 controller the C64x+ Megamodule drops the effective transfer priority down to 7 if no further commands are in the pipeline.
Silicon Updates www.ti.com Advisory 7 DMA Corruption of External Data Buffer Issue Revision(s) Affected: Details: 1.3, 1.2, 1.1, 1.0 Under a specific set of circumstances, an L1D snoop-write will update an unintended L1D cache line. This leads to a corrupted line in L1D, and can lead directly to program misbehavior. If the corrupted line is then modified by a CPU write accesses, a subsequent victim writeback from L1D could commit the corrupted line to lower levels of memory.
Silicon Updates www.ti.com 3. The CPU reads from a cacheable, external memory (e.g., DDR) that is a set match to Cache Line A (referred to here as Cache Line B). Determining if two addresses are a set match can be done by comparing certain bits of two addresses. The mapping of an address to a location in L1D cache is shown in Figure 7. – Please see Appendix B—Determining If Two Addresses are a Set Match for instructions on how to determine of two addresses are a set-match.
Silicon Updates www.ti.com Figure 2 shows the flow of these operations, the incorrect order that causes the issue, and the correct order to avoid the issue. The solid line is Cache Line A and the dashed line is Cache Line B.
Silicon Updates www.ti.com To prevent this sort of race condition, programs should discard in-bound DMA buffers in UMAP1 immediately after use, and keep a strict policy of buffer ownership, such that a given buffer is owned only by the CPU or the DMA at any given time. This model assumes the following steps: 1. DMA fills the buffer during a period when the CPU does not access it 2. DMA engine or other mechanism signals to the CPU that it has finished filling the buffer. 3.
Silicon Updates www.ti.com Workaround 2: Make DMA Buffers Dirty After Use The errant snoop-write occurs only when the DMA buffer in L1D has not been modified. This is due to the additional snoop checking mechanisms associated with tracking victims as they leave L1D. Therefore, another workaround is to mark DMA buffers as dirty before releasing them. This will generate additional victims whenever the buffer gets pushed out of L1D. It will also block the errant snoop-write.
Silicon Updates Workaround 4: www.ti.com Allocate DMA Buffers in L1D RAM or UMAP0 If possible, move DMA buffers that the CPU reads directly out of UMAP1 to either UMAP0 or L1D RAM. A table showing UMAP0 addresses of the C6457 can be found in Table 6. DMA buffers that the CPU does not access directly can remain in UMAP1 safely, as these will not generate snoops. Table 6 UMAP0 Address Range for C6457 1 UMAP0 Address Range RAM 0x00900000 - 0x009FFFFF End of Table 6 1.
Silicon Updates www.ti.com Advisory 8 DMA Corruption of L2 Ram Data Revision(s) Affected: Details: 1.2, 1.1, 1.0 Under a specific set of circumstances, a snoop-write updates an unintended L2 RAM location. This is a result of a corrupted L1D cache writeback, and can lead directly to program misbehavior. If that line is then modified by CPU accesses, a subsequent victim writeback from L1D could commit this corrupted line to lower levels of memory.
Silicon Updates www.ti.com The following steps must all occur concurrently to see the issue: 1. The CPU reads from any address in L2 SRAM that is a set match to Cache Line A (to determine if a set match condition exists, see Appendix B—Determining If Two Addresses are a Set Match) – The set match to Cache Line A is referred to here as Cache Line B.
Silicon Updates www.ti.com Figure 3 shows the sequence of events.
Silicon Updates www.ti.com With all the steps above, it is fairly painful to determine if a particular buffer has the potential to see this issue. Figure 4 is a simple decision tree to help make a determination for a particular buffer.
Silicon Updates www.ti.com When using the above flowchart, if one of the OK fields is reached, then the buffer should not have a potential of being affected. When using the above flowchart, if one of the Potential Problem fields is reached, see the workarounds below. Note—Figure 4 assumes that each buffer is aligned to a 64B-boundary and spans a multiple of 64B. This is because the cache line size of the L1D is 64B.
Silicon Updates www.ti.com To implement this workaround, programmers must write back (and optionally invalidate) the buffer from L1D cache after Step 3 and before Step 4. There are multiple mechanisms for doing this, but the most straightforward is to use the L1D block cache writeback mechanism via L1DWBAR/L1DWWC or the L1D block cache writeback-invalidate mechanism via L1DWIBAR/L1DWIWC. The recommended implementation of this workaround requires calling the l1d_block_wb.asm and l1d_block_wbinv.
Silicon Updates www.ti.com Workaround 3: Workaround for Buffers that the CPU and DMA Access Asynchronously While this situation is rare in most programs, there are some cases where both the CPU and the DMA access the same structure without explicit synchronization. In some cases, this is due to the fact that said accesses are part of an algorithm that implements a synchronization primitive. Regardless of the purpose, these accesses potentially trigger this bug.
Silicon Updates www.ti.com Advisory 9 L2 Victim Traffic Due To L2 Block Writeback During Any Pending CPU Request Revision(s) Affected: Background: 1.2, 1.1, 1.0 The C64x+ megamodule has a Master Direct Memory Access (MDMA) bus interface and a Slave Direct Memory Access (SDMA) bus interface. The MDMA interface provides DSP access to resources outside the C64x+ megamodule (i.e., DDR2 memory). The MDMA interface is used for CPU/cache accesses to memory beyond the level 2 (L2) memory level.
Silicon Updates www.ti.com Figure 5 is a simplified view for illustrative purposes only. The IDMA/SDMA path (orange lines) can also go to L1D/L1P memories and IDMA can go to the DSP CFG peripherals. MDMA transactions (blue lines) can also originate from L1P or L1D through the L2 controller or directly from the DSP.
Silicon Updates www.ti.com SDMA/IDMA stalling and any system impact is most likely in systems with excessive context switching, L1/L2 cache miss/victim traffic, and heavily loaded EMIF. Use the following steps to determine if SDMA/IDMA stalling is the cause of real-time deadline misses for existing applications. Situations where real-time deadlines may be missed include loss of McBSP samples and low peripheral throughput. 1.
Silicon Updates www.ti.com Details: Under certain conditions, L2 victim traffic due to a block writeback can block SDMA/IDMA accesses to UMAP0 during CPU requests. For a definition of UMAP0 for the C6457 device, see ‘‘Appendix C—UMAP0 and UMAP1 Addresses Ranges’’. There are four transactions that must occur to cause an SDMA/IDMA to stall because of this condition: 1. L1D/L1P needs to create an L2$ hit.
Silicon Updates Workaround 2: www.ti.com To reduce the SDMA/IDMA stalling system impact, perform any of the following: 1. Improve system tolerance on DMA side (SDMA/IDMA/MDMA): – Understand and minimize latency-critical SDMA/IDMA accesses to L2 or L1P/D. – Directly reduce critical real-time deadlines, if possible, at peripheral/IO level (e.g., increase word size and/or reduce bit rates on serial ports).
Silicon Updates www.ti.com Re-block the loops. In some cases, restructuring loops can increase reuse in the cache and reduce the total traffic to external memory. › Throttle the loops. If restructuring the code is impractical, then it is reasonable to slow it down. This reduces the likelihood that consecutive SDMA/IDMA blocks stack up in the cache request pipelines, resulting in a long stall.
Silicon Updates www.ti.com Advisory 10 L1P$ Miss May Block SDMA Accesses (Asymmetric Mode Only) Revision(s) Affected: Details: 1.3, 1.2, 1.1, 1.0 This advisory is an update to Advisory 9 in this document. Advisory 9 lists the following blocking condition: • Stall Condition 1 - L2 victim traffic due to L2 block writeback during any pending CPU request This advisory covers one more blocking condition: • Stall Condition 2 - L1P$ miss may stall SDMA accesses For silicon versions 1.0, 1.1, and 1.
Silicon Updates www.ti.com The SDMA in item 1 sets up a bank conflict for the L1D$ read in item 2. The L1D$ allocate in item 2 prevents the L1D$ write/victim (3) from advancing, so it is stuck in the pipeline. This occurs at the same time as an L1P$ allocate that also results in an L2 access to external memory (4), which is also in the same pipeline stage as the L1D$ write/victim (3).
Silicon Updates Workaround 1: www.ti.com Leave in previous SDMA/IDMA stall workarounds (for devices with the original SDMA/IDMA stall). For silicon versions 1.0, 1.1, and 1,2 that were already affected with the first SDMA/IDMA stall issue from Advisory 9, there is no additional workaround needed. If all of the deadlock avoidance steps listed in Advisory 9 have been followed, there is no risk for a deadlock because of this issue.
Silicon Updates www.ti.com Usage Note 1 Manual Cache Coherence Operation Usage Note Revision(s) Affected: Details: 1.3, 1.2, 1.1, 1.0 When an L1DWB, L1DWBINV, L2DWB, or L2DWBINV command is executed, and the writeback is complete, the C64x+ Megamodule will send a single 128-bit message with the address of the last word that the block operation was for. On OMAP devices, the extra sideband signal mentioned above is used to route that to a special endpoint.
Appendix A—Code Examples www.ti.com Appendix A—Code Examples L1D Block Writeback Routine l1d_block_wb.asm ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ======================================================================== ;; L1D Block Writeback ;; ;; l1d_block_wb(void *base, size_t byte_count); ;; ;; Performs a block writeback from L1D to L2. It can be used ;; on any address range (L2 or external), but it only operates on L1D ;; cache. ;; ;; Maximum block size is 256K.
Appendix A—Code Examples www.ti.com L1D Block Writeback-Invalidate Routine l1d_block_wbinv.asm ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ======================================================================== ;; L1D Block Writeback-Invalidate ;; ;; l1d_block_wbinv(void *base, size_t byte_count); ;; ;; Performs a block writeback-invalidate from L1D to L2. It can be used ;; on any address range (L2 or external), but it only operates on L1D ;; cache. ;; ;; Maximum block size is 256K.
Appendix A—Code Examples www.ti.com Make Buffer Dirty Routine make_dirty ;; ;; ;; ;; ;; ;; ======================================================================== ;; Make a block of data "dirty" in L1D ;; ;; make_dirty(void *base, size_t byte_count); ;; ;; ======================================================================== ;; .global _make_dirty .text .asmfunc _make_dirty: ADDK 63, B4 SHR B4, 6, B4 MVC B4, ILC MVK 64, A5 MVK 64, B5 MV A4, B4 NOP SPLOOP 1 LDBU *A4++[A5], A1 NOP 4 MV.
Appendix A—Code Examples www.ti.com Long Distance Load Word Routine ldld.asm ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ======================================================================== ;; Long Distance Load Word ;; ;; int long_dist_load_word(volatile int *addr) ;; ;; This function reads a single word from a remote location with the L1D ;; cache frozen.
Appendix A—Code Examples www.ti.com IDMA Channel 1 Block Copy Routine idma1_util.asm ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ;; ======================================================================== ;; TEXAS INSTRUMENTS INC. ;; ;; Block Copy with IDMA Channel 1 ;; ;; REVISION HISTORY ;; 13-Feb-2009 Initial version . . . . . . . . . . . J.
Appendix A—Code Examples www.ti.com || ; [ A0] LDW *A6, A0 [ A0] BNOP.1 loop?, 4 The 'AND' below is safe because IDMA never returns 10b in 2 LSBs AND.L A4, A0, A0 RETNOP B3, 5 .endasmfunc ;; ======================================================================== ;; ;; End of file: idma1_util.
Appendix B—Determining If Two Addresses are a Set Match www.ti.com Appendix B—Determining If Two Addresses are a Set Match Determining if two addresses are a set match can be done by comparing certain bits of two addresses. The mapping of an address to a location in L1D cache is shown in Figure 7.
Appendix C—UMAP0 and UMAP1 Addresses Ranges www.ti.com Appendix C—UMAP0 and UMAP1 Addresses Ranges The below tables detail the address ranges of UMAP0 and UMAP1 for the C6457 device. Table 11 UMAP0 Address Range for C6457 1 UMAP0 Address Range RAM 0x00900000 - 0x009FFFFF End of Table 11 1. Please note that L2 cache, if used, is a portion of the address range.
IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications, enhancements, improvements, and other changes to its products and services at any time and to discontinue any product or service without notice. Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete.