User`s manual

6-6 Optimizing DSP56300/DSP56600 Applications MOTOROLA
Pipeline Interlocks
Data ALU Pipeline Interlocks
;previous data
move x:(r0)+,bb,y:(r4)+ ;write destination memory,
;read next data
_end
move a,y:(r4)+ ;write last-1 word to
;destination memory
move b,y:(r4)+ ;write last word to destination
;memory
6.1.2.3 Saving Interlocks by Using the TFR Instruction.
The following C code adds a constant to two memory arrays, one in
X memory space and the other in Y memory space:
static int a[N],b[N];
int i;
for (i=0;i<N;i++)
{
b[i] = b[i]+c;}
for (i=0;i<N;i++)
{
a[i] = a[i]+c;}
The straightforward implementation of the code will execute in 8N
cycles:
move var_a,r4 ;a array in Y:memory space
move var_b,r0 ;b array in X:memory space
move var_c,x0 ;constant to add
do #N,_1Loop ;handle Y array
move y:(r4),a ;read data word
add x0,a ;add constant
move a,y:(r4)+ ;store result and increment pointer
_1Loop
do #N,_2Loop ;handle X array
move x:(r0),a ;read data word
add x0,a ;add constant
move a,x:(r0)+ ;store result and increment
;pointer
_2Loop
By combining the two loops into one and using the TFR instruction,
an optimized implementation takes only 1.5 cycles for main loop
iteration summing up to 3N cycles for the whole task:
move var_a,r4 ;a array in Y memory
move var_b,r0 ;b array in X memory
lua (r4)+,r5 ;r5 = r4 + 1
lua (r0)+,r1 ;r1 = r0 + 1
move var_c,x1
move x:(r0),b
add x1,b x:(r1)+,x0 y:(r4),a
do #N,_3Loop
add x1,a b,x:(r0)+ x0,b
add x1,b y:(r5)+,y1
tfr y1,a x:(r1)+,x0 a,y:(r4)+
_3Loop