Stereo Amplifier User Manual

ManualsBrandsTexas Instruments ManualsSecurity CameraTMS320C64X

TMS320C64x+ DSP

Little-Endian DSP Library

Programmer’s Reference

Literature Number: SPRUEB8

February 2006

Summary of content (169 pages)

PAGE 1
TMS320C64x+ DSP Little-Endian DSP Library Programmer’s Reference Literature Number: SPRUEB8 February 2006
PAGE 2
IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications, enhancements, improvements, and other changes to its products and services at any time and to discontinue any product or service without notice. Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
PAGE 3
Preface Read This First About This Manual This document describes the C64x+ digital signal processor little-endian (DSP) Library, or DSPLIB for short. Notational Conventions This document uses the following conventions: - Hexadecimal numbers are shown with the suffix h. For example, the following number is 40 hexadecimal (decimal 64): 40h. - Registers in this document are shown in figures and described in tables. - Macro names are written in uppercase text; function names are written in lowercase.
PAGE 4
Trademarks SPRAA84 — TMS320C64x to TMS320C64+ CPU Migration Guide. Describes migrating from the Texas Instruments TMS320C64x digital signal processor (DSP) to the TMS320C64x+ DSP. The objective of this document is to indicate differences between the two cores. Functionality in the devices that is identical is not included. Trademarks C6000, TMS320C64x+, TMS320C64x, C64x are trademarks of Texas Instruments.
PAGE 5
Contents Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Provides a brief introduction to the TI C64x+ DSPLIBs, shows the organization of the routines contained in the libraries, and lists the features and benefits of the DSPLIBs. 1.1 1.2 2 Installing and Using DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents A Performance/Fractional Q Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Describes performance considerations related to the C64x+ DSPLIB and provides information about the Q format used by DSPLIB functions. A.1 A.2 B iv A-2 A-3 A-3 A-3 A-4 Software Updates and Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 7
Tables Tables 2−1 3−1 3−2 3−3 3−4 3−5 3−6 3−7 3−8 3−9 3−10 A−1 A−2 A−3 A−4 DSPLIB Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Argument Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlation . . . . . . .
PAGE 8
vi
PAGE 9
Chapter 1 Introduction This chapter provides a brief introduction to the TI C64x+ DSP Libraries (DSPLIB), shows the organization of the routines contained in the library, and lists the features and benefits of the DSPLIB. Topic Page 1.1 Introduction to the TI C64x+ DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1.2 Features and Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Introduction to the TI C64x+ DSPLIB 1.1 Introduction to the TI C64x+ DSPLIB The TI C64x+ DSPLIB is an optimized DSP Function Library for C programmers using devices that include the C64x+ megamodule. It includes many C-callable, assembly-optimized, general-purpose signal-processing routines. These routines are typically used in computationally intensive real-time applications where optimal execution speed is critical.
PAGE 11
Introduction to the TI C64x+ DSPLIB - Filtering and convolution J DSP_fir_cplx J DSP_fir_cplx_hM4X4 J DSP_fir_gen J DSP_fir_gen_hM17_rA8X8 J DSP_fir_r4 J DSP_fir_r8 J DSP_fir_r8_hM16_rM8A8X8 J DSP_fir_sym J DSP_iir - Math J DSP_dotp_sqr J DSP_dotprod J DSP_maxval J DSP_maxidx J DSP_minval J DSP_mul32 J DSP_neg32 J DSP_recip16 J DSP_vecsumsq J DSP_w_vec - Matrix J DSP_mat_mul J DSP_mat_trans - Miscellaneous J DSP_bexp J DSP_blk_eswap16 J DSP_blk_eswap32 J DSP_blk_eswap64 J DSP_blk_move J DSP_fltoq15 J DSP_m
PAGE 12
Features and Benefits 1.2 Features and Benefits - 1-4 Hand-coded assembly-optimized routines C and linear assembly source code C-callable routines, fully compatible with the TI C6x compiler Fractional Q.
PAGE 13
Chapter 2 Installing and Using DSPLIB This chapter provides information on how to install and rebuild the TI C64x+ DSPLIB. Topic Page 2.1 How to Install DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.2 Using DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.3 How to Rebuild DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 14
How to Install DSPLIB 2.1 How to Install DSPLIB Note: You should read the README.txt file for specific details of the release. The DSPLIB is provided in the file dsp64plus.zip. The file must be unzipped to provide the following directory structure: dsp | +−−README.
PAGE 15
Using DSPLIB 2.2 Using DSPLIB 2.2.1 2.2.1.1 DSPLIB Arguments and Data Types DSPLIB Types Table 2−1 shows the data types handled by the DSPLIB. Table 2−1. DSPLIB Data Types Name Size (bits) short Type Minimum Maximum 16 integer −32768 32767 int 32 integer −2147483648 2147483647 long 40 integer −549755813888 549755813887 pointer 32 address 0000:0000h FFFF:FFFFh Q.15 16 fraction −0.9999694824... 0.9999694824... Q.31 32 fraction −0.99999999953... 0.99999999953...
PAGE 16
Using Using DSPLIB DSPLIB 2.2.2 Calling a DSPLIB Function From C In addition to correctly installing the DSPLIB software, follow these steps to include a DSPLIB function in the code: - Include the function header file corresponding to the DSPLIB function - Link the code with dsp64plus.lib - Use a correct linker command file for the platform used. The examples in the DSP\Examples folder show how to use the DSPLIB in a Code Composer Studio C envirionment. 2.2.
PAGE 17
How to Rebuild DSPLIB 2.2.6 Interrupt Behavior of DSPLIB Functions All of the functions in this library are designed to be used in systems with interrupts. Thus, it is not necessary to disable interrupts when calling any of these functions. The functions in the library will disable interrupts as needed to protect the execution of code in tight loops and so on. Library functions have three categories: - Fully-interruptible: These functions do not disable interrupts.
PAGE 18
2-6
PAGE 19
Chapter 3 DSPLIB Function Tables This chapter provides tables containing all DSPLIB functions, a brief description of each, and a page reference for more detailed information. Topic Page 3.1 Arguments and Conventions Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.2 DSPLIB Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.3 DSPLIB Function Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 20
Arguments and Conventions Used 3.1 Arguments and Conventions Used The following convention has been used when describing the arguments for each individual function: Table 3−1. Argument Conventions Argument Description x,y Argument reflecting input data vector r Argument reflecting output data vector nx,ny,nr Arguments reflecting the size of vectors x,y, and r, respectively. For functions in the case nx = ny = nr, only nx has been used across.
PAGE 21
DSPLIB Functions 3.2 DSPLIB Functions The routines included in the DSP library are organized into eight functional categories and listed below in alphabetical order.
PAGE 22
DSPLIB Function Tables 3.3 DSPLIB Function Tables Table 3−2. Adaptive Filtering Functions Description long DSP_firlms2(short *h, short *x, short b, int nh) LMS FIR Page 4-2 Table 3−3. Correlation Functions Description Page void DSP_autocor(short *r,short *x, int nx, int nr) Autocorrelation 4-4 void DSP_autocor_rA8(short *r,short *x, int nx, int nr) Autocorrelation ( r[] must be double word aligned) 4-4 Table 3−4.
PAGE 23
DSPLIB Function Tables Table 3−4. FFT (Continued) Functions Description Page void DSP_ifft16x16(short *w, int nx, short *x, short *y) Complex out of place, Inverse FFT mixed radix with digit reversal. Input/Output data in Re/Im order. 4-28 void DSP_ifft16x16_imre(short *w, int nx, short *x, short *y) Complex out of place, Inverse FFT mixed radix with digit reversal. Input/Output data in Re/Im order.
PAGE 24
DSPLIB Function Tables Table 3−5.
PAGE 25
DSPLIB Function Tables Table 3−8.
PAGE 26
Differences Between the C64x and C64x+ DSPLIBs 3.4 Differences Between the C64x and C64x+ DSPLIBs The C64x+ DSPLIB was developed by optimizing some of the functions of the C64x DSPLIB to take advantage of the C64x+ architecture. Table 3−10 shows the optimized functions for the C64x+ DSPLIB. There are two optimization types: - SPLOOP conversion: Optimized code uses SPLOOP to provide interruptibility and decrease power consumption.
PAGE 27
Differences Between the C64x and C64x+ DSPLIBs Table 3−10. Functions Optimized in the C64x+ DSPLIB (Continued) Function DSP_fir_cplx_hM4X4 C64x+ Optimized Yes Optimization Type Kernel re−design, SPLOOP Optimization resulted in new requirements. New name is used. DSP_fir_gen No DSP_fir_gen_hM17_rA8X8 Yes Kernel re−design, SPLOOP Optimization resulted in new requirements. New name is used.
PAGE 28
Differences Between the C64x and C64x+ DSPLIBs Table 3−10.
PAGE 29
Chapter 4 DSPLIB Reference This chapter provides a list of the functions within the DSP library (DSPLIB) organized into functional categories. The functions within each category are listed in alphabetical order and include arguments, descriptions, algorithms, benchmarks, and special requirements. Topic Page 4.1 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 30
DSP_firlms2 4.1 Adaptive Filtering DSP_firlms2 LMS FIR Function long DSP_firlms2(short * restrict h, const short * restrict x, short b, int nh) Arguments h[nh] Coefficient Array x[nh+1] Input Array b Error from previous FIR nh Number of coefficients. Must be multiple of 4. return long Return value Description The Least Mean Square Adaptive Filter computes an update of all nh coefficients by adding the weighted error times the inputs to the original coefficients.
PAGE 31
DSP_firlms2 Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - The loop is unrolled 4 times.
PAGE 32
DSP_autocor 4.2 Correlation DSP_autocor AutoCorrelation Function void DSP_autocor(short * restrict r, const short * restrict x, int nx, int nr) Arguments r[nr] Output array x[nx+nr] Input array. Must be double-word aligned. nx Length of autocorrelation. Must be a multiple of 8. nr Number of lags. Must be a multiple of 4. Description This routine accepts an input array of length nx + nr and performs nr autocorrelations each of length nx producing nr output results.
PAGE 33
DSP_autocor Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - The inner loop is unrolled 8 times. - The outer loop is unrolled 4 times. - The outer loop is conditionally executed in parallel with the inner loop. This allows for a zero overhead outer loop.
PAGE 34
DSP_autocor_rA8 DSP_autocor_rA8 AutoCorrelation Function void DSP_autocor_rA8(short * restrict r, const short * restrict x, int nx, int nr) Arguments r[nr] Output array, Must be double word aligned. x[nx+nr] Input array. Must be double-word aligned. nx Length of autocorrelation. Must be a multiple of 8. nr Number of lags. Must be a multiple of 4. Description This routine accepts an input array of length nx + nr and performs nr autocorrelations each of length nx producing nr output results.
PAGE 35
DSP_autocor_rA8 Benchmarks Cycles Codesize nx<40: 6*nr+ 20 nx>=40: nx*nr/8 + 2*nr + 20 304 bytes C64x+ DSPLIB Reference 4-7
PAGE 36
DSP_fft16x16 4.3 FFT DSP_fft16x16 Complex Forward Mixed Radix 16 x 16-bit FFT Function void DSP_fft16x16(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] Pointer to complex Q.15 FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4 , and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 16-bit data input. y[2*nx] Pointer to complex 16-bit data output.
PAGE 37
DSP_fft16x16 Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. The routine uses log4(nx) − 1 stages of radix-4 transform and performs either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. The conventional Cooley Tukey FFT is written using three loops. The outermost loop “k” cycles through the stages.
PAGE 38
DSP_fft16x16 To vectorize the FFT, it is desirable to access the twiddle factor array using double word wide loads and fetch the twiddle factors needed. To do this, a modified twiddle factor array is created, in which the factors WN/4, WN/2, W3N/4 are arranged to be contiguous. This eliminates the separation between twiddle factors within a butterfly. However, this implies that we maintain a redundant version of the twiddle factor array as the loop is traversed from one stage to another.
PAGE 39
DSP_fft16x16_imre DSP_fft16x16_imre Complex Forward Mixed Radix 16 x 16-bit FFT, With Im/Re Order Function void DSP_fft16x16_imre(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] Pointer to complex Q.15 FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4 , and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 16-bit data input. y[2*nx] Pointer to complex 16-bit data output.
PAGE 40
DSP_fft16x16_imre The routine uses log4(nx) − 1 stages of radix-4 transform and performs either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. The conventional Cooley Tukey FFT is written using three loops. The outermost loop “k” cycles through the stages. There are log N to the base 4 stages in all.
PAGE 41
DSP_fft16x16_imre To vectorize the FFT, it is desirable to access twiddle factor array using double word wide loads and fetch the twiddle factors needed. To do this, a modified twiddle factor array is created, in which the factors WN/4, WN/2, W3N/4 are arranged to be contiguous. This eliminates the separation between twiddle factors within a butterfly. However, this implies that we maintain a redundant version of the twiddle factor array as the loop is traversed from one stage to another.
PAGE 42
DSP_fft16x16r DSP_fft16x16r Complex Forward Mixed Radix 16 x 16-bit FFT With Rounding Function void DSP_fft16x16r(int nx, short * restrict x, const short * restrict w, const unsigned char * restrict brev, short * restrict y, int radix, int offset, int nmax) Arguments nx Length of FFT in complex samples. Must be power of 2 or 4, and ≤16384 x[2*nx] Pointer to complex 16-bit data input w[2*nx] Pointer to complex FFT coefficients brev[64] Pointer to bit reverse table containing 64 entries.
PAGE 43
DSP_fft16x16r void dft(int n, short x[], short y[]) { int k,i, index; const double PI = 3.
PAGE 44
DSP_fft16x16r The function takes the twiddle factors and input data, and calculates the FFT producing the frequency domain data in the y[ ] array. As the FFT allows every input point to affect every output point, which causes cache thrashing in a cache based system. This is mitigated by allowing the main FFT of size N to be divided into several steps, allowing as much data reuse as possible.
PAGE 45
DSP_fft16x16r DSP_fft16x16r(N, &x[0], &w[0], brev,y,N/4,0, N) DSP_fft16x16r(N/4,&x[0], &w[2*3*N/4],brev,y,rad,0, N) DSP_fft16x16r(N/4,&x[2*N/4], &w[2*3*N/4],brev,y,rad,N/4, N) DSP_fft16x16r(N/4,&x[2*N/2], &w[2*3*N/4],brev,y,rad,N/2, N) DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4,N) As discussed previously, N can be either a power of 4 or 2. If N is a power of 4, then rad = 4, and if N is a power of 2 and not a power of 4, then rad = 2.
PAGE 46
DSP_fft16x16r { int i, l0, l1, l2, h2, predj; int l1p1,l2p1,h2p1, tw_offset, stride, fft_jmp; short xt0, yt0, xt1, yt1, xt2, yt2; short si1,si2,si3,co1,co2,co3; short xh0,xh1,xh20,xh21,xl0,xl1,xl20,xl21; short x_0, x_1, x_l1, x_l1p1, x_h2 , x_h2p1, x_l2, x_l2p1; short *x,*w; short *ptr_x0, *ptr_x2, *y0; unsigned int j, k, j0, j1, k0, k1; short x0, x1, x2, x3, x4, x5, x6, x7; short xh0_0, xh1_0, xh0_1, xh1_1; short xl0_0, xl1_0, xl0_1, xl1_1; short yt3, yt4, yt5, yt6, yt7; unsigned a, num; stride = n;
PAGE 47
DSP_fft16x16r x_1 = x[1]; x_h2 = x[h2]; x_h2p1 = x[h2+1]; x_l1 = x[l1]; x_l1p1 = x[l1+1]; x_l2 = x[l2]; x_l2p1 = x[l2+1]; xh0 = x_0 + x_l1; xh1 = x_1 + x_l1p1; xl0 = x_0 − x_l1; xl1 = x_1 − x_l1p1; xh20 = x_h2 + x_l2; xh21 = x_h2p1 + x_l2p1; xl20 = x_h2 − x_l2; xl21 = x_h2p1 − x_l2p1; ptr_x0 = x; ptr_x0[0] = ((short)(xh0 + xh20))>>1; ptr_x0[1] = ((short)(xh1 + xh21))>>1; ptr_x2 = ptr_x0; x += 2; predj = (j − fft_jmp); if (!predj) x += fft_jmp; if (!predj) j = 0; xt0 = xh0 − xh20;
PAGE 48
DSP_fft16x16r ptr_x2[h2p1] = (yt0 * co2 − xt0 * si2 + 0x00008000) >> 16; ptr_x2[l2 ] = (xt2 * co3 + yt2 * si3 + 0x00008000) >> 16; ptr_x2[l2p1] = (yt2 * co3 − xt2 * si3 + 0x00008000) >> 16; } tw_offset += fft_jmp; stride = stride>>2; } /* end while */ j = offset>>2; ptr_x0 = ptr_x; y0 = y; /* determine _norm(nmax) − 17 */ l0 = 31; if (((nmax>>31)&1)==1) num = ~nmax; else num = nmax; if (!num) l0 = 32; else { a=num&0xFFFF0000; if (a) { l0−=16; num=a; } a=num&0xFF00FF00; if (a) { l0−= 8; num=a; } a=num&0xF
PAGE 49
DSP_fft16x16r k = (k0 << 6) | k1; if (l0 < 0) k = k << −l0; else k = k >> l0; j++; /* multiple of 4 index */ x0 = ptr_x0[0]; x1 = ptr_x0[1]; x2 = ptr_x0[2]; x3 = ptr_x0[3]; x4 = ptr_x0[4]; x5 = ptr_x0[5]; x6 = ptr_x0[6]; x7 = ptr_x0[7]; ptr_x0 += 8; xh0_0 = x0 + x4; xh1_0 = x1 + x5; xh0_1 = x2 + x6; xh1_1 = x3 + x7; if (radix == 2) { xh0_0 = x0; xh1_0 = x1; xh0_1 = x2; xh1_1 = x3; } yt0 = xh0_0 + xh0_1; yt1 = xh1_0 + xh1_1; yt4 = xh0_0 − xh0_1; yt5 = xh1_0 − xh1_1; xl0_0
PAGE 50
DSP_fft16x16r xl1_1 = x6; xl0_1 = x7; } yt2 = xl0_0 + xl1_1; yt3 = xl1_0 − xl0_1; yt6 = xl0_0 − xl1_1; yt7 = xl1_0 + xl0_1; if (radix == 2) { yt7 = xl1_0 − xl0_1; yt3 = xl1_0 + xl0_1; } y0[k] = yt0; y0[k+1] = yt1; k += n>>1; y0[k] = yt2; y0[k+1] = yt3; k += n>>1; y0[k] = yt4; y0[k+1] = yt5; k += n>>1; y0[k] = yt6; y0[k+1] = yt7; } } Special Requirements - In-place computation is not allowed. - nx must be a power of 2 or 4.
PAGE 51
DSP_fft16x16r Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - The routine uses log4(nx) − 1 stages of radix-4 transform and performs either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. - A special sequence of coefficients used as generated above produces the FFT.
PAGE 52
DSP_fft16x32 DSP_fft16x32 Complex Forward Mixed Radix 16 x 32-bit FFT With Rounding Function void DSP_fft16x32(const short * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] Pointer to complex Q.15 FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4, and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 32-bit data input. y[2*nx] Pointer to complex 32-bit data output.
PAGE 53
DSP_fft16x32 Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - The routine uses log4(nx) − 1 stages of radix-4 transform and performs either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. - See the fft16x16t implementation notes, as similar ideas are used. Benchmarks Cycles Codesize (10.
PAGE 54
DSP_fft32x32 DSP_fft32x32 Complex Forward Mixed Radix 32 x 32-bit FFT With Rounding Function void DSP_fft32x32(const int * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] Pointer to complex 32-bit FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4, and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 32-bit data input. y[2*nx] Pointer to complex 32-bit data output.
PAGE 55
DSP_fft32x32 Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - The routine uses log4(nx) − 1 stages of radix-4 transform and performs either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. - See the fft16x16t implementation notes, as similar ideas are used.
PAGE 56
DSP_fft32x32s DSP_fft32x32s Complex Forward Mixed Radix 32 x 32-bit FFT With Scaling Function void DSP_fft32x32s(const int * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] Pointer to complex 32-bit FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4, and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 32-bit data input. y[2*nx] Pointer to complex 32-bit data output.
PAGE 57
DSP_fft32x32s - The FFT coefficients (twiddle factors) are generated using the program tw_fft32x32 provided in the directory ‘support\fft’. The scale factor must be 1073741823.5. The input data must be scaled by 2(log2(nx) − ceil[ log4(nx)−1 ]) to completely prevent overflow. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - Scaling is performed at each stage by shifting the results right by 1, preventing overflow.
PAGE 58
DSP_ifft16x16 DSP_ifft16x16 Complex Inverse Mixed Radix 16 x 16-bit FFT With Rounding Function void DSP_ifft16x16(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] Pointer to complex Q.15 FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4, and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 16-bit data input. y[2*nx] Pointer to complex 16-bit data output.
PAGE 59
DSP_ifft16x16 Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - The routine uses log4(nx) − 1 stages of radix-4 transform and performs either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. - See the fft16x16 implementation notes, as similar ideas are used.
PAGE 60
DSP_ifft16x16_imre DSP_ifft16x16_imre Complex Inverse Mixed Radix 16 x 16-bit FFT With Im/Re Order Function void DSP_ifft16x16_imre(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] Pointer to complex Q.15 FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4, and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex data input. y[2*nx] Pointer to complex data output.
PAGE 61
DSP_ifft16x16_imre Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - The routine uses log4(nx) − 1 stages of radix-4 transform and performs either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. - See the fft16x16 implementation notes, as similar ideas are used.
PAGE 62
DSP_ifft16x32 DSP_ifft16x32 Complex Inverse Mixed Radix 16 x 32-bit FFT With Rounding Function void DSP_ifft16x32(const short * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] Pointer to complex Q.15 FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4, and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 32-bit data input. y[2*nx] Pointer to complex 32-bit data output.
PAGE 63
DSP_ifft16x32 - The FFT coefficients (twiddle factors) are generated using the program tw_fft16x32 provided in the directory ‘support\fft’. The scale factor must be 32767.5. No scaling is done with the function; thus the input data must be scaled by 2log2(nx) to completely prevent overflow. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible.
PAGE 64
DSP_ifft32x32 DSP_ifft32x32 Complex Inverse Mixed Radix 32 x 32-bit FFT With Rounding Function void DSP_ifft32x32(const int * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] Pointer to complex 32-bit FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4, and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 32-bit data input. y[2*nx] Pointer to complex 32-bit data output.
PAGE 65
DSP_ifft32x32 - The FFT coefficients (twiddle factors) are generated using the program tw_fft32x32 provided in the directory ‘support\fft’. The scale factor must be 2147483647.5. No scaling is done with the function; thus the input data must be scaled by 2log2(nx) to completely prevent overflow. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible.
PAGE 66
DSP_fir_cplx 4.4 Filtering and Convolution DSP_fir_cplx Complex FIR Filter Function void DSP_fir_cplx (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[2*(nr+nh−1)] Complex input data. x must point to x[2*(nh−1)]. h[2*nh] Complex coefficients (in normal order). r[2*nr] Complex output data. nh Number of complex coefficients. Must be a multiple of 2. nr Number of complex output samples. Must be a multiple of 4.
PAGE 67
DSP_fir_cplx Special Requirements - The number of coefficients nh must be a multiple of 2. - The number of output samples nr must be a multiple of 4. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - The outer loop is unrolled 4 times while the inner loop is not unrolled. - Both inner and outer loops are collapsed in one loop.
PAGE 68
DSP_fir_cplx_hM4X4 DSP_fir_cplx_hM4X4 Complex FIR Filter Function void DSP_fir_cplx _hM4X4(const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[2*(nr+nh−1)] Complex input data. x must point to x[2*(nh−1)]. h[2*nh] Complex coefficients (in normal order). r[2*nr] Complex output data. nh Number of complex coefficients. Must be a multiple of 4. nr Number of complex output samples. Must be a multiple of 4.
PAGE 69
DSP_fir_cplx_hM4X4 Special Requirements - The number of coefficients nh must be larger or equal to 4 and a multiple of 4. - The number of output samples nr must be a multiple of 4. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is fully interruptible. - The outer loop is unrolled 4 times while the inner loop is not unrolled. - Both inner and outer loops are collapsed in one loop.
PAGE 70
DSP_fir_gen DSP_fir_gen FIR Filter Function void DSP_fir_gen (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[nr+nh−1] Pointer to input array of size nr + nh − 1. h[nh] Pointer to coefficient array of size nh (coefficients must be in reverse order). r[nr] Pointer to output array of size nr. Must be word aligned. nh Number of coefficients. Must be ≥5. nr Number of samples to calculate. Must be a multiple of 4.
PAGE 71
DSP_fir_gen Special Requirements - The number of coefficients, nh, must be greater than or equal to 5. Coefficients must be in reverse order. - The number of outputs computed, nr, must be a multiple of 4 and greater than or equal to 4. - Array r[ ] must be word aligned. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - Load double-word instruction is used to simultaneously load four values in a single clock cycle.
PAGE 72
DSP_fir_gen_hM17_rA8X8 DSP_fir_gen_hM17_rA8X8 FIR Filter Function void DSP_fir_gen_hM17_rA8X8 (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[nr+nh−1] Pointer to input array of size nr + nh − 1. h[nh] Pointer to coefficient array of size nh (coefficients must be in reverse order). r[nr] Pointer to output array of size nr. Must be double word aligned. nh Number of coefficients. Must be ≥17. nr Number of samples to calculate.
PAGE 73
DSP_fir_gen_hM17_rA8X8 Special Requirements - The number of coefficients, nh, must be greater than or equal to 17. Coefficients must be in reverse order. - The number of outputs computed, nr, must be a multiple of 8 and greater than or equal to 8. - Array r[ ] must be word aligned. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is fully interruptible. - Load double-word instruction is used to simultaneously load four values in a single clock cycle.
PAGE 74
DSP_fir_r4 DSP_fir_r4 FIR Filter (when the number of coefficients is a multiple of 4) Function void DSP_fir_r4 (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[nr+nh−1] Pointer to input array of size nr + nh – 1. h[nh] Pointer to coefficient array of size nh (coefficients must be in reverse order). r[nr] Pointer to output array of size nr. nh Number of coefficients. Must be multiple of 4 and ≥8. nr Number of samples to calculate.
PAGE 75
DSP_fir_r4 Special Requirements - The number of coefficients, nh, must be a multiple of 4 and greater than or equal to 8. Coefficients must be in reverse order. - The number of outputs computed, nr, must be a multiple of 4 and greater than or equal to 4. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - The load double-word instruction is used to simultaneously load four values in a single clock cycle.
PAGE 76
DSP_fir_r8 DSP_fir_r8 FIR Filter (when the number of coefficients is a multiple of 8) Function void DSP_fir_r8_hM16_rM8A8X8 (short *x, short *h, short *r, int nh, int nr) Arguments x[nr+nh−1] Pointer to input array of size nr + nh – 1. h[nh] Pointer to coefficient array of size nh (coefficients must be in reverse order). r[nr] Pointer to output array of size nr. Must be word aligned. nh Number of coefficients. Must be multiple of 8, ≥ 8. nr Number of samples to calculate.
PAGE 77
DSP_fir_r8 Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - The load double-word instruction is used to simultaneously load four values in a single clock cycle. - The inner loop is unrolled 4 times and will always compute a multiple of 4 output samples. - The outer loop is conditionally executed in parallel with the inner loop. This allows for a zero overhead outer loop.
PAGE 78
DSP_fir_r8_hM16_rM8A8X8 DSP_fir_r8_hM16_rM8A8X8 FIR Filter (the number of coefficients is a multiple of 8) Function void DSP_fir_r8_hM16_rM8A8X8 (short *x, short *h, short *r, int nh, int nr) Arguments x[nr+nh−1] Pointer to input array of size nr + nh – 1. h[nh] Pointer to coefficient array of size nh (coefficients must be in reverse order). r[nr] Pointer to output array of size nr. Must be double word aligned. nh Number of coefficients. Must be multiple of 8, ≥ 16.
PAGE 79
DSP_fir_r8_hM16_rM8A8X8 Special Requirements - The number of coefficients, nh, must be a multiple of 8 and greater than or equal to 16. Coefficients must be in reverse order. - The number of outputs computed, nr, must be a multiple of 8 and greater than or equal to 8. - Array r[ ] must be double word aligned. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible.
PAGE 80
DSP_fir_sym DSP_fir_sym Symmetric FIR Filter Function void DSP_fir_sym (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr, int s) Arguments x[nr+2*nh] Pointer to input array of size nr + 2*nh. Must be double-word aligned. h[nh+1] Pointer to coefficient array of size nh + 1. Coefficients are in normal order and only half (nh+1 out of 2*nh+1) are required. Must be double-word aligned. r[nr] Pointer to output array of size nr. Must be word aligned.
PAGE 81
DSP_fir_sym y0 += (short) (x[j + i] + x[j + 2 * nh − i]) * h[i]; y0 += x[j + nh] * h[nh]; r[j] = (int) (y0 >> s); } } Special Requirements - nh must be a multiple of 8. The number of original symmetric coefficients is 2*nh+1. Only half (nh+1) are required. - nr must be a multiple of 4. - x[ ] and h[ ] must be double-word aligned. - r[ ] must be word aligned. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible.
PAGE 82
DSP_iir DSP_iir IIR With 5 Coefficients Function void DSP_iir (short * restrict r1, const short * restrict x, short * restrict r2, const short * restrict h2, const short * restrict h1, int nr) Arguments r1[nr+4] must Output array (used in actual computation. First four elements have the previous outputs.) x[nr+4] Input array r2[nr] Output array (stored) h2[5] Moving-average filter coefficients h1[5] Auto-regressive filter coefficients. h1[0] is not used. nr Number of output samples.
PAGE 83
DSP_iir Special Requirements - nr is greater than or equal to 8. - Input data array x[ ] contains nr + 4 input samples to produce nr output samples. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - Output array r1[ ] contains nr + 4 locations, r2[ ] contains nr locations for storing nr output samples. The output samples are stored with an offset of 4 into the r1[ ] array.
PAGE 84
DSP_iirlat DSP_iirlat All-Pole IIR Lattice Filter Function void DSP_iirlat(const short * restrict x, int nx, const short * restrict k, int nk, int * restrict b, short * restrict r) Arguments x[nx] Input vector (16-bit). nx Length of input vector. k[nk] Reflection coefficients in Q.15 format. nk Number of reflection coefficients/lattice stages. Must be >=4. Make multiple of 2 to avoid bank conflicts. b[nk+1] Delay line elements from previous call.
PAGE 85
DSP_iirlat rt = rt − (short)(b[i] >> 15) * k[i]; b[i + 1] = b[i] + (short)(rt >> 15) * k[i]; } b[0] = rt; r[j] = rt >> 15; } } Special Requirements - nk must be >= 4. - No special alignment requirements - See Bank Conflicts for avoiding bank conflicts Implementation Notes - Bank Conflicts: nk should be a multiple of 2, otherwise bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible.
PAGE 86
DSP_dotp_sqr 4.5 Math DSP_dotp_sqr Vector Dot Product and Square Function int DSP_dotp_sqr(int G, const short * restrict x, const short * restrict y, int * restrict r, int nx) Arguments G Calculated value of G (used in the VSELP coder). x[nx] First vector array y[nx] Second vector array r Result of vector dot product of x and y. nx Number of elements. Must be multiple of 4, and ≥12. return int New value of G.
PAGE 87
DSP_dotp_sqr Special Requirements nx must be a multiple of 4 and greater than or equal to 12. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible.
PAGE 88
DSP_dotprod DSP_dotprod Vector Dot Product Function int DSP_dotprod(const short * restrict x, const short * restrict y, int nx) Arguments x[nx] First vector array. Must be double-word aligned. y[nx] Second vector array. Must be double word-aligned. nx Number of elements of vector. Must be multiple of 4. return int Dot product of x and y. Description This routine takes two vectors and calculates their dot product. The inputs are 16-bit short data and the output is a 32-bit number.
PAGE 89
DSP_dotprod Implementation Notes - Bank Conflicts: No bank conflicts occur if the input arrays x[ ] and y[ ] are offset by 4 half-words (8 bytes). - Interruptibility: The code is fully interruptible. - The code is unrolled 4 times to enable full memory and multiplier bandwidth to be utilized. - Interrupts are masked by branch delay slots only. - Prolog collapsing has been performed to reduce codesize.
PAGE 90
DSP_maxval DSP_maxval Maximum Value of Vector Function short DSP_maxval (const short *x, int nx) Arguments x[nx] Pointer to input vector of size nx. nx Length of input data vector. Must be multiple of 8 and ≥32. return short Maximum value of a vector. Description This routine finds the element with maximum value in the input vector and returns that value. Algorithm This is the C equivalent of the assembly code without restrictions.
PAGE 91
DSP_maxidx DSP_maxidx Index of Maximum Element of Vector Function int DSP_maxidx (const short *x, int nx) Arguments x[nx] Pointer to input vector of size nx. Must be double-word aligned. nx Length of input data vector. Must be multiple of 16 and ≥ 48. return int Index for vector element with maximum value. Description This routine finds the max value of a vector and returns the index of that value. The input array is treated as 16 separate columns that are interleaved throughout the array.
PAGE 92
DSP_maxidx Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - The code is unrolled 16 times to enable the full bandwidth of LDDW and MAX2 instructions to be utilized. This splits the search into 16 sub-ranges. The global maximum is then found from the list of maximums of the sub-ranges. Then, using this offset from the sub-ranges, the global maximum and the index of it are found using a simple match.
PAGE 93
DSP_minval DSP_minval Minimum Value of Vector Function short DSP_minval (const short *x, int nx) Arguments x [nx] Pointer to input vector of size nx. nx Length of input data vector. Must be multiple of 4 and ≥20. return short Maximum value of a vector. Description This routine finds the minimum value of a vector and returns the value. Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply.
PAGE 94
DSP_mul32 DSP_mul32 32-Bit Vector Multiply Function void DSP_mul32(const int * restrict x, const int * restrict y, int * restrict r, short nx) Arguments x[nx] Pointer to input data vector 1 of size nx. Must be double-word aligned. y[nx] Pointer to input data vector 2 of size nx. Must be double-word aligned. r[nx] Pointer to output data vector of size nx. Must be double-word aligned. nx Number of elements in input and output vectors. Must be multiple of 8 and ≥16.
PAGE 95
DSP_mul32 e+=d; /* Xhigh*Yhigh + */ /* (Xhigh*Ylow+Xlow*Yhigh)>>16 */ *(r++)=e; } } Special Requirements - nx must be a multiple of 8 and greater than or equal to 16. - Input and output vectors must be double-word aligned. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - The MPYHI instruction is used to perform 16 x 32 multiplies to form 48-bit intermediate results.
PAGE 96
DSP_neg32 DSP_neg32 32-Bit Vector Negate Function void DSP_neg32(int *x, int *r, short nx) Arguments x[nx] Pointer to input data vector 1 of size nx with 32-bit elements. Must be double-word aligned. Pointer to output data vector of size nx with 32-bit elements. Must be double-word aligned. Number of elements of input and output vectors. Must be a multiple of 4 and ≥8. r[nx] nx Description This function negates the elements of a vector (32-bit elements).
PAGE 97
DSP_recip16 DSP_recip16 16-Bit Reciprocal Function void DSP_recip16 (short *x, short *rfrac, short *rexp, short nx) Arguments x[nx] Pointer to Q.15 input data vector of size nx. rfrac[nx] Pointer to Q.15 output data vector for fractional values. rexp[nx] Pointer to output data vector for exponent values. nx Number of elements of input and output vectors. Description This routine returns the fractional and exponential portion of the reciprocal of an array x[ ] of Q.15 numbers.
PAGE 98
DSP_recip16 *(rexp++)=normal−15; b=0x80000000; /* store exponent */ /* dividend = 1 */ for(j=15;j>0;j−−) b=_subc(b,a); b=b&0x7FFF; /* divide */ /* clear remainder /* (clear upper half) */ if(neg) b=−b; /* if originally /* negative, negate */ *(rfrac++)=b; /* store fraction */ } } Special Requirements None Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interruptible. - The conditional subtract instruction, SUBC, is used for division.
PAGE 99
DSP_vecsumsq DSP_vecsumsq Sum of Squares Function int DSP_vecsumsq (const short *x, int nx) Arguments x[nx] Input vector nx Number of elements in x. Must be multiple of 4 and ≥8. return int Sum of the squares Description This routine returns the sum of squares of the elements contained in the vector x[ ]. Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply.
PAGE 100
DSP_w_vec DSP_w_vec Weighted Vector Sum Function void DSP_w_vec(const short * restrict x, const short * restrict y, short m, short * restrict r, short nr) Arguments x[nr] Vector being weighted. Must be double-word aligned. y[nr] Summation vector. Must be double-word aligned. m Weighting factor r[nr] Output vector nr Dimensions of the vectors. Must be multiple of 8 and ≥8. Description This routine is used to obtain the weighted vector sum. Both the inputs and output are 16-bit numbers.
PAGE 101
DSP_mat_mul 4.6 Matrix DSP_mat_mul Matrix Multiplication Function void DSP_mat_mul(const short * restrict x, int r1, int c1, const short * restrict y, int c2, short * restrict r, int qs) Arguments x [r1*c1] Pointer to input matrix of size r1*c1. r1 Number of rows in matrix x. c1 Number of columns in matrix x. Also number of rows in y. y [c1*c2] Pointer to input matrix of size c1*c2. c2 Number of columns in matrix y. r [r1*c2] Pointer to output matrix of size r1*c2.
PAGE 102
DSP_mat_mul for (i = 0; i < r1; i++) for (j = 0; j < c2; j++) { sum = 0; for (k = 0; k < c1; k++) sum += x[k + i*c1] * y[j + k*c2]; r[j + i*c2] = sum >> qs; } } Special Requirements - The arrays x[], y[], and r[] are stored in distinct arrays. That is, in-place processing is not allowed. - The input matrices have minimum dimensions of at least 1 row and 1 column, and maximum dimensions of 32767 rows and 32767 columns. Implementation Notes - Bank Conflicts: No bank conflicts occur.
PAGE 103
DSP_mat_trans DSP_mat_trans Matrix Transpose Function void DSP_mat_trans (const short *x, short rows, short columns, short *r) Arguments x[rows*columns] Pointer to input matrix. rows Number of rows in the input matrix. Must be a multiple of 4. columns Number of columns in the input matrix. Must be a multiple of 4. r[columns*rows] Pointer to output data vector of size rows*columns. Description This function transposes the input matrix x[ ] and writes the result to matrix r[ ].
PAGE 104
DSP_bexp 4.7 Miscellaneous DSP_bexp Block Exponent Implementation Function short DSP_bexp(const int *x, short nx) Arguments x[nx] Pointer to input vector of size nx. Must be double-word aligned. nx Number of elements in input vector. Must be multiple of 8. return short Return value is the maximum exponent that may be used in scaling. Description Computes the exponents (number of extra sign bits) of all values in the input vector x[ ] and returns the minimum exponent.
PAGE 105
DSP_bexp Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible.
PAGE 106
DSP_blk_eswap16 DSP_blk_eswap16 Endian-Swap a Block of 16-Bit Values Function void blk_eswap16(void * restrict x, void * restrict r, int nx) Arguments x [nx] Source data. Must be double-word aligned. r [nx] Destination array. Must be double-word aligned. nx Number of 16-bit values to swap. Must be multiple of 8. Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each half-word of the r[] array is reversed.
PAGE 107
DSP_blk_eswap16 Special Requirements - Input and output arrays do not overlap, except when “r == NULL” so that the operation occurs in-place. - The input array and output array are expected to be double-word aligned, and a multiple of 8 half-words must be processed. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible.
PAGE 108
DSP_blk_eswap32 DSP_blk_eswap32 Endian-Swap a Block of 32-Bit Values Function void blk_eswap32(void * restrict x, void * restrict r, int nx) Arguments x [nx] Source data. Must be double-word aligned. r [nx] Destination array. Must be double-word aligned. nx Number of 32-bit values to swap. Must be multiple of 4. Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each word of the r[] array is reversed.
PAGE 109
DSP_blk_eswap32 t2 = _x[i*4 + 1]; t3 = _x[i*4 + 0]; _r[i*4 + 0] = t0; _r[i*4 + 1] = t1; _r[i*4 + 2] = t2; _r[i*4 + 3] = t3; } } Special Requirements - Input and output arrays do not overlap, except where “r == NULL” so that the operation occurs in-place. - The input array and output array are expected to be double-word aligned, and a multiple of 4 words must be processed. Implementation Notes - Bank Conflicts: No bank conflicts occur.
PAGE 110
DSP_blk_eswap64 DSP_blk_eswap64 Endian-Swap a Block of 64-Bit Values Function void blk_eswap64(void * restrict x, void * restrict r, int nx) Arguments x[nx] Source data. Must be double-word aligned. r[nx] Destination array. Must be double-word aligned. nx Number of 64-bit values to swap. Must be multiple of 2. Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each double-word of the r[] array is reversed.
PAGE 111
DSP_blk_eswap64 t2 = _x[i*8 + 5]; t3 = _x[i*8 + 4]; t4 = _x[i*8 + 3]; t5 = _x[i*8 + 2]; t6 = _x[i*8 + 1]; t7 = _x[i*8 + 0]; _r[i*8 + 0] = t0; _r[i*8 + 1] = t1; _r[i*8 + 2] = t2; _r[i*8 + 3] = t3; _r[i*8 + 4] = t4; _r[i*8 + 5] = t5; _r[i*8 + 6] = t6; _r[i*8 + 7] = t7; } } Special Requirements - Input and output arrays do not overlap, except when “r == NULL” so that the operation occurs in-place.
PAGE 112
DSP_blk_move DSP_blk_move Block Move (Overlapping) Function void DSP_blk_move(short * x, short * r, int nx) Arguments x [nx] Block of data to be moved. r [nx] Destination of block of data. nx Number of elements in block. Must be multiple of 8 and ≥32. Description This routine moves nx 16-bit elements from one memory location pointed to by x to another pointed to by r. The source and destination blocks can be overlapped.
PAGE 113
DSP_fltoq15 DSP_fltoq15 Float to Q15 Conversion Function void DSP_fltoq15 (float *x, short *r, short nx) Arguments x[nx] Pointer to floating-point input vector of size nx. x should contain the numbers normalized between [−1,1). r[nx] Pointer to output data vector of size nx containing the Q.15 equivalent of vector x. nx Length of input and output data vectors. Must be multiple of 2. Description Convert the IEEE floating point numbers stored in vector x[ ] into Q.
PAGE 114
DSP_fltoq15 Implementation Notes - Loop is unrolled twice. - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible.
PAGE 115
DSP_minerror DSP_minerror Minimum Energy Error Search Function int minerror (const short * restrict GSP0_TABLE, const short * restrict errCoefs, int * restrict max_index) Arguments GSP0_TABLE[9*256] GSP0 terms array. Must be double-word aligned. errCoefs[9] Array of error coefficients. max_index Pointer to GSP0_TABLE[max_index] found. return int Maximum dot product result. Algorithm This is the C equivalent of the assembly code without restrictions.
PAGE 116
DSP_minerror Special Requirements Array GSP0_TABLE[] must be double-word aligned. Implementation Notes - Bank Conflicts: No bank conflicts occur. - Interruptibility: The code is interrupt-tolerant but not interruptible. - The load double-word instruction is used to simultaneously load four values in a single clock cycle. - The inner loop is completely unrolled. - The outer loop is 4 times unrolled.
PAGE 117
DSP_q15tofl DSP_q15tofl Q15 to Float Conversion Function void DSP_q15tofl (short *x, float *r, int nx) Arguments x[nx] Pointer to Q.15 input vector of size nx. r[nx] Pointer to floating-point output data vector of size nx containing the floating-point equivalent of vector x. nx Length of input and output data vectors. Must be multiple of 2. Description Converts the values stored in vector x[ ] in Q.15 format to IEEE floating point numbers in output vector r[ ].
PAGE 118
DSP_bitrev_cplx 4.8 Obsolete Functions 4.8.1 FFT DSP_bitrev_cplx Complex Bit-Reverse NOTE: This function is provided for backward compatibility with the C62x DSPLIB. It has not been optimized for the C64x architecture. You are advised to use one of the newly added FFT functions which have been optimized for the C64x.
PAGE 119
DSP_bitrev_cplx int nbits, nbot, ntop, ndiff, n2, halfn; short *xs = (short *) x; nbits = 0; i = nx; while (i > 1){ i = i >> 1; nbits++;} nbot = nbits >> 1; ndiff = nbits & 1; ntop = nbot + ndiff; n2 = 1 << ntop; mask = n2 − 1; halfn = nx >> 1; for (i0 = 0; i0 < halfn; i0 += 2) { b = i0 & mask; a = i0 >> nbot; if (!b) ia = index[a]; ib = index[b]; ibs = ib << nbot; j0 = ibs + ia; t = i0 < j0; xi0 = x[i0]; xj0 = x[j0]; if (t){x[i0] = xj0; x[j0] = xi0;} i1 = i0 + 1; j1 = j0 + halfn; xi1
PAGE 120
DSP_bitrev_cplx if (t){x[i3] = xj3; x[j3] = xi3;} } } Special Requirements - nx must be a power of 2. - The array index[] is generated by the routine bitrev_index provided in the directory ‘support\fft’. - If nx ≤ 4K, you can use the char (8-bit) data type for the “index” variable. This requires changing the LDH when loading index values in the assembly routine to LDB. This further reduces the size of the Index Table by half.
PAGE 121
DSP_radix2 DSP_radix2 Complex Forward FFT (radix 2) NOTE: This function is provided for backward compatibility with the C62x DSPLIB. It has not been optimized for the C64x architecture. You are advised to use one of the newly added FFT functions which have been optimized for the C64x. Function void DSP_radix2 (int nx, short * restrict x, const short * restrict w) Arguments nx Number of complex elements in vector x. Must be a power of 2 such that 4 ≤ nx ≤ 65536.
PAGE 122
DSP_radix2 xt = x[2*l] − x[2*i]; x[2*i] = x[2*i] + x[2*l]; yt = x[2*l+1] − x[2*i+1]; x[2*i+1] = x[2*i+1] + x[2*l+1]; x[2*l] = (c*xt + s*yt)>>15; x[2*l+1] = (c*yt − s*xt)>>15; } } ie = ie<<1; } } Special Requirements - 2 ≤ nx ≤ 32768 (nx is a power of 2) - Input x and coefficients w should be in different data sections or memory spaces to eliminate memory bank hits. If this is not possible, they should be aligned on different word boundaries to minimize memory bank hits.
PAGE 123
DSP_r4fft DSP_r4fft Complex Forward FFT (radix 4) NOTE: This function is provided for backward compatibility with the C62x DSPLIB. It has not been optimized for the C64x architecture. You are advised to use one of the newly added FFT functions which have been optimized for the C64x. Function void DSP_r4fft (int nx, short * restrict x, const short * restrict w) Arguments nx Number of complex elements in vector x. Must be a power of 4 such that 4 ≤ nx ≤ 65536.
PAGE 124
DSP_r4fft si1 = w[ia1 * 2]; co2 = w[ia2 * 2 + 1]; si2 = w[ia2 * 2]; co3 = w[ia3 * 2 + 1]; si3 = w[ia3 * 2]; ia1 = ia1 + ie; for (i0 = j; i0 < nx; i0 += n1) { i1 = i0 + n2; i2 = i1 + n2; i3 = i2 + n2; r1 = x[2 * i0] + x[2 * i2]; r2 = x[2 * i0] − x[2 * i2]; t = x[2 * i1] + x[2 * i3]; x[2 * i0] = r1 + t; r1 = r1 − t; s1 = x[2 * i0 + 1] + x[2 * i2 + 1]; s2 = x[2 * i0 + 1] − x[2 * i2 + 1]; t = x[2 * i1 + 1] + x[2 * i3 + 1]; x[2 * i0 + 1] = s1 + t; s1 = s1 − t; x[2 * i2] = (r1 * co2 + s1 * si2) >> 15; x[2 * i2 +
PAGE 125
DSP_r4fft >>15; x[2 * i3 + 1] = (s2 * co3−r2 * si3)>>15; } } ie <<= 2; } } Special Requirements - 4 ≤ nx ≤ 65536 (nx a power of 4) - x is aligned on a 4*nx byte boundary for circular buffering - Input x and coefficients w should be in different data sections or memory spaces to eliminate memory bank hits. If this is not possible, w should be aligned on an odd word boundary to minimize memory bank hits - x data is stored in the order real[0], image[0], real[1], ...
PAGE 126
DSP_fft DSP_fft Complex Forward FFT With Digital Reversal Function void DSP_fft (const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] Pointer to vector of Q.15 FFT coefficients of size 2 * nx elements. Must be double-word aligned. nx Number of complex elements in vector x. Must be a power of 4 and 4 ≤ nx ≤ 65536. x[2*nx] Pointer to input sequence of size 2 * nx elements. Must be double-word aligned.
PAGE 127
DSP_fft #include #include
PAGE 128
DSP_fft _nassert((int)x % 8 == 0); _nassert((int)y % 8 == 0); _nassert((int)w % 8 == 0); _nassert(n >= 16); _nassert(n < 32768); #endif /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ /* Perform initial stages of FFT in place w/out digit reversal.
PAGE 129
DSP_fft { #ifndef NOASSUME _nassert(i % 4 == 0); _nassert(s >= 4); #pragma MUST_ITERATE(2,,2); #endif for (j = 0; j < s; j += 2) { for (k = 0; k < 2; k++) { short w1c, w1s, w2c, w2s, w3c, w3s; short x0r, x0i, x1r, x1i, x2r, x2i, x3r, x3i; short y0r, y0i, y1r, y1i, y2r, y2i, y3r, y3i; /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ /* Read the four samples that are the input to this */ /* particular butterfly.
PAGE 130
DSP_fft /* the stride between the elements as follows: */ /* x(n), x(n + s), x(n + 2*s), x(n + 3*s).
PAGE 131
DSP_fft xl1 = x0i − x2i; xl20 = x1r − x3r; xl21 = x1i − x3i; xt0 = xh0 + xh20; yt0 = xh1 + xh21; xt1 = xl0 + xl21; yt1 = xl1 − xl20; xt2 = xh0 − xh20; yt2 = xh1 − xh21; xt3 = xl0 − xl21; yt3 = xl1 + xl20; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Perform twiddle factor multiplies of three terms,top */ /* term does not have any multiplies. Note the twiddle */ /* factors for a normal FFT are C + j (−S).
PAGE 132
DSP_fft /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ /* Offset to next subtable of twiddle factors. With each iteration */ /* of the above block, six twiddle factors get read, s times, */ /* hence the offset into the twiddle factor array is advanced by */ /* this amount.
PAGE 133
DSP_fft x0r = x[2*(i + 0) + 0]; x0i = x[2*(i + 0) + 1]; x1r = x[2*(i + 1) + 0]; x1i = x[2*(i + 1) + 1]; x2r = x[2*(i + 2) + 0]; x2i = x[2*(i + 2) + 1]; x3r = x[2*(i + 3) + 0]; x3i = x[2*(i + 3) + 1]; /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ /* Calculate the final FFT result from this butterfly.
PAGE 134
DSP_fft Special Requirements - In-place computation is not allowed. - nx must be a power of 4 and 4 ≤ nx ≤ 65536. - Input x[ ] and output y[ ] are stored on double-word aligned boundaries. - Input data x[ ] is stored in the order real0, img0, real1, img1, ... - The FFT coefficients (twiddle factors) must be double-word aligned and are generated using the program tw_fft16x16 provided in the directory ‘support\fft’. Implementation Notes - Bank Conflicts: No bank conflicts occur.
PAGE 135
DSP_fft16x16t DSP_fft16x16t Complex Forward Mixed Radix 16- x 16-Bit FFT With Truncation Function void DSP_fft16x16t(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] Pointer to complex Q.15 FFT coefficients. nx Length of FFT in complex samples. Must be power of 2 or 4 , and 16 ≤ nx ≤ 32768. x[2*nx] Pointer to complex 16-bit data input. y[2*nx] Pointer to complex 16-bit data output.
PAGE 136
DSP_fft16x16t # define DIG_REV(i, m, j) ((j) = (_shfl(_rotl(_bitr(_deal(i)), 16)) >> (m))) #else # define DIG_REV(i, m, j) \ do { \ unsigned _ = (i); \ _ = ((_ & 0x33333333) << 2) | ((_ & ~0x33333333) >> 2); \ _ = ((_ & 0x0F0F0F0F) << 4) | ((_ & ~0x0F0F0F0F) >> 4); \ _ = ((_ & 0x00FF00FF) << 8) | ((_ & ~0x00FF00FF) >> 8); \ _ = ((_ & 0x0000FFFF) << 16) | ((_ & ~0x0000FFFF) >> 16); \ (j) = _ >> (m); \ } while (0) #endif void DSP_fft16x16t_cn(const short *restrict ptr_w, int npoints,
PAGE 137
DSP_fft16x16t /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Determine the magnitude od the number of points to be transformed. */ /* Check whether we can use a radix4 decomposition or a mixed radix */ /* transformation, by determining modulo 2.
PAGE 138
DSP_fft16x16t /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Set up offsets to access ”N/4”, ”N/2”, ”3N/4” complex point or */ /* ”N/2”, ”N”, ”3N/2” half word */ /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ h2 = stride>>1; l1 = stride; l2 = stride + (stride >> 1); /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Reset ”x” to point to the start of the input data array.
PAGE 139
DSP_fft16x16t /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ co10 = w[j+1]; si10 = w[j+0]; co11 = w[j+3]; si11 = w[j+2]; co20 = w[j+5]; si20 = w[j+4]; co21 = w[j+7]; si21 = w[j+6]; co30 = w[j+9]; si30 = w[j+8]; co31 = w[j+11]; si31 = w[j+10]; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Read in the first complex input for the butterflies.
PAGE 140
DSP_fft16x16t xl0_1 = x_2 − x_l1_2; xl1_1 = x_3 − x_l1_3; xh20_0 = x_h2_0 + x_l2_0; xh21_0 = x_h2_1 + x_l2_1; xh20_1 = x_h2_2 + x_l2_2; xh21_1 = x_h2_3 + x_l2_3; xl20_0 = x_h2_0 − x_l2_0; xl21_0 = x_h2_1 − x_l2_1; xl20_1 = x_h2_2 − x_l2_2; xl21_1 = x_h2_3 − x_l2_3; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Derive output pointers using the input pointer ”x” */ /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ x0 = x; x2 = x0; /*−−−−−−−−−−−−−−−−−−−−−−−−−
PAGE 141
DSP_fft16x16t /* y0i = x0i + x2i + x3i = xh1 + xh21 */ /* y1r = x0r − x2r + (x1i − x1i + x3i) = xl0 + xl21 */ /* y1i = x0i − x2i − (x1r − x3r) = xl1 − xl20 */ /* y2r = x0r + x2r − (x1r + x3r) = xh0 − xh20 */ /* y2i = x0i + x2i − (x1i + x3i = xh1 − xh21 */ /* y3r = x0r − x2r − (x1i − x3i) = xl0 − xl21 */ /* y3i = x0i − x2i + (x1r − x3r) = xl1 + xl20 */ /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ y0r = xh0_0 + xh20_0; y0i = xh1_0 + xh21_0; y4r = xh0_
PAGE 142
DSP_fft16x16t x2[h2+1] = (co10 * yt1_0 − si10 * xt1_0) >> 15; x2[h2+2] = (si11 * yt1_1 + co11 * xt1_1) >> 15; x2[h2+3] = (co11 * yt1_1 − si11 * xt1_1) >> 15; x2[l1 ] = (si20 * yt0_0 + co20 * xt0_0) >> 15; x2[l1+1] = (co20 * yt0_0 − si20 * xt0_0) >> 15; x2[l1+2] = (si21 * yt0_1 + co21 * xt0_1) >> 15; x2[l1+3] = (co21 * yt0_1 − si21 * xt0_1) >> 15; x2[l2 ] = (si30 * yt2_0 + co30 * xt2_0) >> 15; x2[l2+1] = (co30 * yt2_0 − si30 * xt2_0) >> 15; x2[l2+2] = (si31 * yt2_1 + co31 * xt2_1) >> 15; x2[l2+3] = (co3
PAGE 143
DSP_fft16x16t } else { y1 = y0 + (int) (npoints >> 1); y3 = y2 + (int) (npoints >> 1); l1 = norm + 2; j0 = 4; n0 = npoints >> 2; } /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* The following code reads data indentically for either a radix 4 */ /* or a radix 2 style decomposition. It writes out at different */ /* locations though. It checks if either half the points, or a */ /* quarter of the complex points have been exhausted to jump to */ /* pervent double reversal.
PAGE 144
DSP_fft16x16t xl0_1 = x_2 − x_6; xl1_1 = x_3 − x_7; n00 = xh0_0 + xh0_1; n01 = xh1_0 + xh1_1; n10 = xl0_0 + xl1_1; n11 = xl1_0 − xl0_1; n20 = xh0_0 − xh0_1; n21 = xh1_0 − xh1_1; n30 = xl0_0 − xl1_1; n31 = xl1_0 + xl0_1; if (radix == 2) { /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Perform DSP_radix2 style decomposition.
PAGE 145
DSP_fft16x16t if (radix == 2) { n02 = x_8 + x_a; n03 = x_9 + x_b; n22 = x_8 − x_a; n23 = x_9 − x_b; n12 = x_c + x_e; n13 = x_d + x_f; n32 = x_c − x_e; n33 = x_d − x_f; } /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Points that are read from succesive locations map to y, y[N/4] */ /* y[N/2], y[3N/4] in a radix4 scheme, y, y[N/8], y[N/2],y[5N/8] */ /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ y0[2*h2+2] = n02; y0[2*h2+3] = n03; y1[2*h2+2] = n
PAGE 146
DSP_fft16x16t Special Requirements - In-place computation is not allowed. - The size of the FFT, nx, must be power of 2 or 4, and 16 ≤ nx ≤ 32768. - The arrays for the complex input data x[ ], complex output data y[ ], and twiddle factors w[ ] must be double-word aligned. - The input and output data are complex, with the real/imaginary components stored in adjacent locations in the array. The real components are stored at even array indices, and the imaginary components are stored at odd array indices.
PAGE 147
DSP_fft16x16t The following statements can be made based on above observations: 1) Inner loop “i0” iterates a variable number of times. In particular, the number of iterations quadruples every time from 1..N/4. Hence, software pipelining a loop that iterates a variable number of times is not profitable. 2) Outer loop “j” iterates a variable number of times as well. However, the number of iterations is quartered every time from N/4 ..1. Hence, the behavior in (a) and (b) are exactly opposite to each other.
PAGE 148
DSP_fft16x16t There is one slight break in the flow of packed processing. The real part of the complex number is in the lower half, and the imaginary part is in the upper half. The flow breaks for “xl0” and “xl1” because in this case the real part needs to be combined with the imaginary part because of the multiplication by “j”. This requires a packed quantity like “xl21xl20” to be rotated as “xl20xl21” so that it can be combined using ADD2s and SUB2s.
PAGE 149
Appendix AppendixAA Performance/Fractional Q Formats This appendix describes performance considerations related to the C64x+ DSPLIB and provides information about the Q format used by DSPLIB functions. Topic Page A.1 Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A.2 Fractional Q Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 150
Performance Considerations A.1 Performance Considerations The ceil( ) is used in some benchmark formulas to accurately describe the number of cycles. It returns a number rounded up, away from zero, to the nearest integer. For example, ceil(1.1) returns 2. Although DSPLIB can be used as a first estimation of processor performance for a specific function, you should be aware that the generic nature of DSPLIB might add extra cycles not required for customer specific usage.
PAGE 151
Fractional Q Formats A.2 Fractional Q Formats Unless specifically noted, DSPLIB functions use Q15 format, or to be more exact, Q0.15. In a Qm.n format, there are m bits used to represent the two’s complement integer portion of the number, and n bits used to represent the two’s complement fractional portion. m+n+1 bits are needed to store a general Qm.n number. The extra bit is needed to store the sign of the number in the most-significant bit position.
PAGE 152
Fractional Q Formats A.2.3 Q.31 Format Q.31 format spans two 16-bit memory words. The 16-bit word stored in the lower memory location contains the 16 least significant bits, and the higher memory location contains the most significant 15 bits and the sign bit. The approximate allowable range of numbers in Q.31 representation is (−1,1) and the finest fractional resolution is 2−31 = 4.66 × 10−10. Table A−3. Q.
PAGE 153
Appendix AppendixBA Software Updates and Customer Support This appendix provides information about software updates and customer support. Topic Page B.1 DSPLIB Software Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 B.2 DSPLIB Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 154
DSPLIB DSPLIB Software Software Updates Updates / DSPLIB Customer Support B.1 DSPLIB Software Updates C64x DSPLIB software updates may be periodically released incorporating product enhancements and fixes as they become available. You should read the README.TXT available in the root directory of every release. B.2 DSPLIB Customer Support If you have questions or want to report problems or suggestions regarding the C64x DSPLIB, contact Texas Instruments at dsph@ti.com.
PAGE 155
Appendix AppendixCA Glossary A address: The location of program code or data stored; an individually accessible memory location. A-law companding: See compress and expand (compand). API: See application programming interface. application programming interface (API): Used for proprietary application programs to interact with communications software or to conform to protocols from another vendor’s product.
PAGE 156
Glossary board support library (BSL): The BSL is a set of application programming interfaces (APIs) consisting of target side DSP code used to configure and control board level peripherals. boot: The process of loading a program into program memory. boot mode: The method of loading a program into program memory. The C6x DSP supports booting from external ROM or the host port interface (HPI). BSL: See board support library. byte: A sequence of eight adjacent bits operated upon as a unit.
PAGE 157
Glossary compress and expand (compand): A quantization scheme for audio signals in which the input signal is compressed and then, after processing, is reconstructed at the output by expansion. There are two distinct companding schemes: A-law (used in Europe) and μ-law (used in the United States). control register: A register that contains bit fields that define the way a device operates. control register file: A set of control registers. CSL: See chip support library.
PAGE 158
Glossary DSP_blk_move: Block move. DSP_dotp_sqr: Vector dot product and square. DSP_dotprod: Vector dot product. DSP_fft: Complex forward FFT with digital reversal. DSP_fft16x16r: Complex forward mixed radix 16- x 16-bit FFT with rounding. DSP_fft16x16t: Complex forward mixed radix 16- x 16-bit FFT with truncation. DSP_fft16x32: Complex forward mixed radix 16- x 32-bit FFT with rounding. DSP_fft32x32: Complex forward mixed radix 32- x 32-bit FFT with rounding.
PAGE 159
Glossary DSP_minval: Minimum value of a vector. DSP_mul32: 32-bit vector multiply. DSP_neg32: 32-bit vector negate. DSP_q15tofl: Q15 to float conversion. DSP_radix2: Complex forward FFT (radix 2). DSP_recip16: 16-bit reciprocal. DSP_r4fft: Complex forward FFT (radix 4). DSP_vecsumsq: Sum of squares. DSP_w_vec: Weighted vector sum. E evaluation module (EVM): Board and software tools that allow the user to evaluate a specific device.
PAGE 160
Glossary H HAL: Hardware abstraction layer of the CSL. The HAL underlies the service layer and provides it a set of macros and constants for manipulating the peripheral registers at the lowest level. It is a low-level symbolic interface into the hardware providing symbols that describe peripheral registers/bitfields, and macros for manipulating them. host: A device to which other devices (peripherals) are connected and that generally controls those devices.
PAGE 161
Glossary interrupt service table (IST) A table containing a corresponding entry for each of the 16 physical interrupts. Each entry is a single-fetch packet and has a label associated with it. Internal peripherals: Devices connected to and controlled by a host device. The C6x internal peripherals include the direct memory access (DMA) controller, multichannel buffered serial ports (McBSPs), host port interface (HPI), external memory-interface (EMIF), and runtime support timers.
PAGE 162
Glossary N nonmaskable interrupt (NMI): An interrupt that can be neither masked nor disabled. O object file: A file that has been assembled or linked and contains machine language object code. off chip: A state of being external to a device. on chip: A state of being internal to a device. P peripheral: A device connected to and usually controlled by a host device. program cache: A fast memory cache for storing program instructions allowing for quick execution.
PAGE 163
Glossary reset: A means of bringing the CPU to a known state by setting the registers and control bits to predetermined values and signaling execution to start at a specified address. RTOS Real-time operating system. S service layer: The top layer of the 2-layer chip support library architecture providing high-level APIs into the CSL and BSL. The service layer is where the actual APIs are defined and is the interface layer.
PAGE 164
C-10
PAGE 165
Index A adaptive filtering functions 3-4 DSPLIB reference 4-2 address, defined C-1 A-law companding, defined C-1 API, defined C-1 application programming interface, defined argument conventions 3-2 arguments, DSPLIB 2-3 assembler, defined C-1 assert, defined C-1 B big endian, defined C-1 bit, defined C-1 block, defined C-1 board support library, defined boot, defined C-2 boot mode, defined C-2 BSL, defined C-2 byte, defined C-2 C-2 C cache, defined C-2 cache controller, defined C-2 CCS, defined C-2 centr
PAGE 166
Index DSP_dotprod defined C-4 DSPLIB reference DSP_fft defined C-4 DSPLIB reference 4-60 DSP_ifft32x32 defined C-4 DSPLIB reference 4-36 4-98 DSP_iir defined C-4 DSPLIB reference 4-54 DSP_iirlat, DSPLIB reference DSP_fft16x16r defined C-4 DSPLIB reference 4-14 DSP_fft16x16t defined C-4 DSPLIB reference DSP_mat_trans defined C-4 DSPLIB reference 4-75 4-8, 4-11, 4-107 DSP_maxidx defined C-4 DSPLIB reference 4-63 DSP_maxval defined C-4 DSPLIB reference 4-62 DSP_minerror defined C-4 DSPLIB re
PAGE 167
DSP_w_vec defined C-5 DSPLIB reference 4-72 DSPLIB argument conventions, table 3-2 arguments 2-3 arguments and data types 2-3 calling a function from Assembly 2-4 calling a function from C 2-4 customer support B-2 data types, table 2-3 features and benefits 1-4 fractional Q formats A-3 functional categories 1-2 functions 3-3 adaptive filtering 3-4 correlation 3-4 FFT (fast Fourier transform) 3-4 filtering and convolution 3-5 math 3-6 matrix 3-6 miscellaneous 3-7 how DSPLIB deals with overflow and scaling
PAGE 168
Index F L fetch packet, defined C-5 FFT (fast Fourier transform) defined C-5 functions 3-4 FFT (fast Fourier transform) functions, DSPLIB reference 4-8 filtering and convolution functions 3-5 DSPLIB reference 4-38 flag, defined C-5 fractional Q formats A-3 frame, defined C-5 function calling a DSPLIB function from Assembly calling a DSPLIB function from C 2-4 functions, DSPLIB 3-3 least significant bit (LSB), defined lib directory 2-2 linker, defined C-7 little endian, defined C-7 M 2-4 G GIE bit, de
PAGE 169
Q S Q.3.12 bit fields Q.3.12 format A-3 Q.3.15 bit fields A-3 Q.3.15 format Q.31 format service layer, defined C-9 software updates B-2 STDINC module, defined C-9 synchronous-burst static random-access memory (SBSRAM), defined C-9 synchronous dynamic random-access memory (SDRAM), defined C-9 syntax, defined C-9 system software, defined C-9 A-3 A-3 A-4 Q.31 high-memory location bit fields Q.