Freescale Semiconductor, Inc... Freescale Semiconductor, Inc. Porting and Optimizing DSP56800 Applications to DSP56800E Application Note by Cristian Caciuloiu, Radu Preda, Radu Bacrau, and Costel Ilas AN2095/D Rev. 0, 04/2001 For More Information On This Product, Go to: www.freescale.
Freescale Semiconductor, Inc. How to Reach Us: Home Page: www.freescale.com E-mail: support@freescale.com Freescale Semiconductor, Inc... USA/Europe or Locations Not Listed: Freescale Semiconductor Technical Information Center, CH370 1300 N. Alma School Road Chandler, Arizona 85224 +1-800-521-6274 or +1-480-768-2130 support@freescale.
Freescale Semiconductor, Inc. Abstract and Contents Freescale Semiconductor, Inc... The DSP56800E’s DSP core architecture represents the next step in the evolution of Motorola’s 16-bit DSP56800 Family of digital signal processors. It maintains compatibility with the DSP56800 while improving performance and adding new features.
Freescale Semiconductor, Inc. 6 6.1 6.2 Converting Applications for Increased Data and Program Memory . . . . . . . 23 Extending Data Memory Size From 64K to 16M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Extending Program Memory Size From 64K to 2M . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Freescale Semiconductor, Inc. 1 Introduction The DSP56800E’s DSP core architecture represents the next step in the evolution of Motorola’s 16-bit DSP56800 Family of digital signal processors. It maintains compatibility with the DSP56800 while improving performance and adding new features. Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. 1.1 Case Study The application chosen as an example to be ported was the implementation of the International Telecommunications Union (ITU) Recommendation V.22 bis. The original code was taken from Freescale Embedded Software Development Kit (SDK) version 2.1. This software development kit, which runs with Metrowerks CodeWarrior 3.5.1 for the DSP56800 Family, can be found at the following URL: http://www.freescale.com Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. Porting Process 2.1 Porting Process To set up the application to run on DSP56800E tools, the following steps were performed: 1. The original application code was tested, and test vectors were obtained. These steps are required for testing the ported code and for possible further optimization. 2. The original code’s compliance with the coding requirements for porting was verified. 2.1.
Freescale Semiconductor, Inc. Another case is when the (Rn) addressing mode is used and an LEA instruction updates the address register. As expected, the AGU overflows on the DSP56800. When the code is ported to DSP56800E, the results are different than in the preceding case. This addressing mode, which exists in the enhanced core to ensure DSP56800 compatibility, causes the AGU to produce 16-bit addresses by filling the upper 8 bits with 0, simulating an overflow.
Freescale Semiconductor, Inc. Application Performance Comparison To avoid this problem, scan the code for ADC, SBC, or DIV instructions that are run with the SA bit set. If the operands are not always 32-bit and sign extended or if the result of the operation having 32-bit sign-extended operands is larger than 32 bits, the solution is the SAT instruction. This instruction, available only on the DSP56800E, forces the saturation (see Code Example 3).
Freescale Semiconductor, Inc. NOTE: Throughout this application note, 1 word equals 16 bits. The size of the data memory does not include the gaps that the circular buffers introduce. Freescale Semiconductor, Inc... The size of data is the same on both platforms (as expected), but the size of the code is slightly (5 percent) larger on the ported application. One explanation is that some instructions (such as Bcc) are coded with more words on the DSP56800E than on the DSP56800.
Freescale Semiconductor, Inc. Application Performance Comparison Methods based on almost all the new DSP56800E features, which are summarized in Section 1, “Introduction,” were used to improve the speed of the application. Table 3 and Table 4 present the overall results obtained after most of the time-consuming functions were optimized. Table 3. Size Comparison Code Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. Table 5. Optimization Gains on the Most Time-Consuming Functions Freescale Semiconductor, Inc... Function Initial Final Gain RXBPF 2158 2015 6.63% RXDEMOD 717 641 10.60% RXINTP 265 252 4.91% RXCDAGC 152.67 142.54 6.64% RXDECIM 118 116 1.69% tx_fm 192 190 1.04% RXEQUD 201 199 1.00% TONEDETECT 318 286 10.06% RXS1 110.25 106.53 3.37% RXUSB1 71.74 69.73 2.80% rx_dscr 125.03 104.47 16.44% RXEQERR 99.54 95.54 4.02% tx_scr 81.
Freescale Semiconductor, Inc. New Registers Table 6. Available Delay Slots (Continued) Delayed Instructions Number of Delay Slots RTID 3 RTSD 3 FRTID 2 Freescale Semiconductor, Inc... The number of instruction words filling the delay slots must equal the number of delay slots. If some delay slots cannot be filled with valid instructions, then each unused delay slot must be filled with a NOP instruction.
Freescale Semiconductor, Inc. • Two new address registers, R4 and R5 • A second offset register, N3 • New loop address and counter registers, LA2 and LC2 • FISR and FIRA registers • Shadow registers for R0, R1, N, and M01 LA2 and LC2 are discussed in Section 3.8. The second offset register, N3, can be used for the second memory read in a dual parallel memory read, but it was not used in this project.
Freescale Semiconductor, Inc. Immediate Operands Code Example 7. Using Address Registers to Store Addresses do #12,END_RX_BPF move x:>BPF_PTR,r3 ; 2 cycles, 2 words ... use and modify r3 END_RX_BPF ; DSP56800 original code: 12*2 cycles / 2 words move.w x:>BPF_PTR,r4 ; 2 cycles, 2 words do #12,END_RX_BPF tfra r4,r3 ; 1 cycle, 1 word ... use and modify r3 END_RX_BPF ; DSP56800E optimized code: 2+12*1 cycles / 2+1 words Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. The improvement achieved by the AGU arithmetic is illustrated in Code Example 10, which is taken from the function RXDEMOD (from the file rx_demod.asm). Code Example 10. Performing Address Calculations on DSP56800 do #12,end_rx_demod ; Loop 12 times ... move.w #SIN_TBL,y0 ; Get address of the table add.w a1,y0 ; Add offset to the start address move.w y0,r1 ; Load into the address register ...
Freescale Semiconductor, Inc. Operations and Memory Access on 32 Bits Code Example 13. Copying a Buffer on DSP56800 SECTION ... tx_out ... ENDSEC TX_MEM ds 12 ; The output buffer SECTION V22B_TX ... move #tx_out,r1 ; Load address of output buffer do #12,up_txout ; Repeat 12 times move x:(r0)+,x0 ; Update 16-bit values array with move x0,x:(r1)+ ; values obtained from a table. up_txout ... ENDSEC ; DSP56800 original code: 12*2 cycles / 2 words Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. 3.6 Operations and Memory Access on 8 Bits Compared to its predecessor, the DSP56800E architecture introduced another new data type: 8-bit data. There are instructions that have an 8-bit operand in memory. The 8-bit data can be accessed using two types of pointers: word pointers and byte pointers. The Core Reference Manual contains more information about these features. Using this new data type reduces the amount of data memory.
Freescale Semiconductor, Inc.in Data ALU Operations New Addressing Modes and New Register Combinations For example, compare the instruction ADD on the DSP56800 to the instructions ADD, ADD.L, and ADD.W on the DSP56800E. On the DSP56800, ADD operands can use the following addressing modes: register, immediate, direct, and displacement relative to SP. Of course, there are restrictions regarding the allowed register combinations.
Freescale Semiconductor, Inc. Code Example 22. Restrictions Removed Using MACR on DSP56800E do ... macr #12,end_rx_demod ;Loop 12 times b1,y0,a ;the register combination is ; allowed on DSP56800E ... end_rx_demod ; DSP56800E optimized code: 12*1 cycles / 1 word In terms of size, the gain is 1 word. In terms of speed, the gain is 1 cycle multiplied by the number of loops (because the sequence is extracted from a loop). Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. RXDEMOD The main project contains only one place where two imbricated DO loops are used, in the function RXBPF from the file rx_bpf.asm. The function performs band pass filtering and contains an outer loop that executes 12 times. This optimization method saves 60 (5 × 12) cycles per symbol out of an initial average of 4278.5 cycles per symbol. This single method produces an improvement of 1.4 percent. 4 Writing DSP56800E Code from Scratch Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. Table 7. Results Obtained for RXDEMOD Speed Freescale Semiconductor, Inc... RXDEMOD Size Minimum (Cycles) Maximum (Cycles) Average (Cycles) Gain Over Initial (%) Value (Words) Gain Over Initial (%) Initial 745 745 745 N/A 68 N/A Optimized 621 621 621 16.64 65 4.41 Written from scratch 522 522 522 29.93 71 –4.
Freescale Semiconductor, Inc. RXEQERR The original code contains 10 register-to-register moves that compensate for the lack of accumulators and the reduced number of register combinations for MACs. The optimized ported code eliminates five of these moves, leading to an improvement of 60 (12 × 5) cycles on the entire function. The written from scratch version eliminates all of these transfers, leading to an improvement of 60 additional cycles, or a total improvement of 120 cycles. 4.
Freescale Semiconductor, Inc. Freescale Semiconductor, Inc... Code Example 28. Optimized Ported Code on DSP56800E move.w #0,y1 ; ... tst.w a ; ... jgt APOS ; move.w #$0100,y1 ; APOS ... move.w #0,x0 ; tst b ; jgt BPOS ; move.w #$0100,x0 ; BPOS ... move.w x0,b1 ... eor.w y1,b ; ...
Freescale Semiconductor, DataInc. ALU Pipeline Dependencies not occur on DSP56800 (specifically, data ALU pipeline dependencies and hardware looping dependencies). Also, DSP56800E eliminated additional dependencies, such as, loading an address register with an immediate value and using it to address the next immediate instruction. The DSP56800E core handles the pipeline dependencies in two different manners: • In most cases a hardware interlock automatically causes stalls of the DSP56800E pipeline.
Freescale Semiconductor, Inc. Between instruction n2 and n1 is a data ALU pipeline dependency. Because the result becomes available in B after the Execute 2 phase, the n2 instruction must stall 1 cycle to be able to write the B content in the memory. Four cycles are needed for execution of the sequence and can be rewritten as shown in Code Example 32. Code Example 32.
Freescale Semiconductor, Inc.with Hardware Looping Dependencies Code Example 33. Code Without AGU Pipeline Dependencies on DSP56800 n1: n2: n3: n4: n5: move add move nop move y1,x:>tx_quad b,a a,r1 x:(r1)+,a1 ; ; ; ; ; Store tx_quad Get the actual address of variable in r1 Necessary to avoid dependency on DSP56800 Get the variable On DSP56800 the NOP introduced in instruction n4, avoids the pipeline dependency.
Freescale Semiconductor, Inc. 6.1 Extending Data Memory Size From 64K to 16M There are two assembler switches that instruct the DSP56800E application to use more than 16 bits for addresses: -od21 and -od24. Following these instructions, all addresses will become 24 bits long instead of 16 bits. Source code changes must be made to support this. Instructions that are forced by the ‘>’ operator to use 16-bit data addresses must be forced with the new ‘>>’ operator to use 24-bit addresses.
Freescale Semiconductor, Extending Program Inc. Memory Size From 64K to 2M Table 9. Summary of Extended Data Memory Size Size (Words) Extended Data Memory Speed (Cycles) Data Program Initial version 828 316 195246 Data memory extended 832 345 199865 +0.48 +9.17 +2.36 Increase (percentage) Freescale Semiconductor, Inc... Code size increased and speed decreased, however the differences are insignificant.
Freescale Semiconductor, Inc. Code Example 42. DSP56800 Original Code rx_next_task lea (sp)+ move x:>RxQ_ptr,r3 incw x:RxQ_ptr move x:(r3),x0 move x0,x:(sp)+ move sr,x:(sp) rts ; DSP56800 original code ; ; ; ; ; ; Restore the RxQ pointer Increment the RxQ_ptr. Get the address of next task Push the address of task to be performed onto the stack Perform task This code is functional on the DSP56800E platform within the 64K program memory boundary.
Freescale Semiconductor, Extending Program Inc. Memory Size From 64K to 2M 7 Conclusions Freescale Semiconductor, Inc... This application note investigated the process of porting an application developed for DSP56800 to the DSP56800E and the methods to optimize the ported code using the new features of DSP56800E. Also, the methods to optimize selected ported functions were analyzed and compared to redesigning and rewriting the functions.
Freescale Semiconductor, Inc. Freescale Semiconductor, Inc... compared to DSP56800. Also new optimization methods can be introduced to further increase the performance. Moreover, the new DSP is faster than the older (120 MHz versus 35 MHz) and this means that the actual execution time is much shorter.
Freescale Semiconductor, Inc. Appendix A Functions Written from Scratch Freescale Semiconductor, Inc... A.1 Optimized Ported Version of RXDEMOD RXDEMOD move.l move.l move.l #BPF_OUT,r3 #RXCB2A,r2 #MOD_TBL,r0 move.l move.l move.w #SIN_TBL,r5 #DPHASE,r4 x:CDP,d do moveu.w #12,end_rx_demod #$80ff,m01 tfr add.w move.w move.w d,a x:(r4),a #$0080,y0 a1,x:(r4) mpy a1,y0,a move.w bfclr a0,y1 #$ff00,a lsr.w y1 moveu.w adda a1,r1 r5,r1 moveu.w move.w move.w sub move.
Freescale Semiconductor, Inc. move.w mpy ; ; x:(r3)+,y0 ; ; -y0,x0,a ; y1,x0,a a,x:(r2)+ ; ; b1,y0,a ; ; ; a,x:(r2)+ ; #$ffff,m01 ; b,b b1,y1,a macr mpy macr move.w moveu.w end_rx_demod End_RXDEMOD jmp rx_next_task DEMODULATE Saturate the output X*COS in a Y in y0 X*COS-Y*-SIN X*-SIN in a Get Y Y*COS+X*-SIN in a this register combination for macr is allowed on 56800e Save demodulated output r0 in linear addr. Mode ; Go to next task A.2 RXDEMOD Written from Scratch Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc. A.3 Optimized Ported Version of RXEQERR Freescale Semiconductor, Inc... RXEQERR move.w moveu.w moveu.w moveu.w moveu.w #$0,n #EQX,r3 #DECX,r0 #DP,r1 #DX,r2 move.w tfr sub mpy macr tfr sub x:(r0)+,y0 x0,b y0,b x0,y1,a -x0,y0,a x0,a y1,a move.w mpy b,y0 y0,y0,b move.w mac asl a,y0 y0,y0,b b asl move.w mac b #$7000,y0 x0,y0,b move.w b,x:(r2)+n EQUD22 move.w move.w tst jeq jgt move.w APOS deca deca move.
Freescale Semiconductor, Inc. CAR_NOR move.w x:WRPFLG,b tst b jge clr move.w jmpd move.w start move.w move.w move.w sub x:LASTDP,b a,y1 #$0400,x0 b,a POS #$fc00,x0 abs move.w move.w cmp.
Freescale Semiconductor, Inc. move.w DO_DIV move.w bfclr rep div move.w SET_DP eor bge neg SAME_SIGN #$0400,a ; tmp <- #$0400 b,x0 #$0001,sr #11 x0,a a0,a c1,y1 SAME_SIGN a ; compare signs of realp and DP ; if they have the same sign Freescale Semiconductor, Inc... ; a contains DP UNWRAP tst.w bge jmpd clr clr next_2 move.w move.w move.w move.w move.w sub tgt abs clr cmp.w tgt sub move.w add final move.w move.w End_RXEQERR jmpd move.
Freescale Semiconductor, Inc... Freescale Semiconductor, Inc. A-6 Porting and Optimizing DSP56800 Applications to DSP56800E For More Information On This Product, Go to: www.freescale.