Technical information

ManualsBrandsMotorola ManualsComputer equipmentDSP56800

Extending Program Memory Size From 64K to 2M

 Conclusions 27

7 Conclusions

This application note investigated the process of porting an application developed for DSP56800 to the

DSP56800E and the methods to optimize the ported code using the new features of DSP56800E. Also, the

methods to optimize selected ported functions were analyzed and compared to redesigning and rewriting

the functions.

Porting an existing DSP56800 code to DSP56800E is almost a direct process because the assembly code is

compatible. There are certain requirements the code must meet to comply, but in normal applications they

are not usually an issue. The only exception is the “MAC Output Limiter,” but that can also be corrected.

The ported code runs on DSP56800E in almost half the number of cycles and uses nearly the same

program size. Some pipeline effects can occur on the ported code, however these do not influence the

correctness of the results. Only the execution time (in cycles) is slightly longer than half the number of

cycles of the DSP56800 original code. Also, the program memory size of the ported application is slightly

larger than the original. For the selected application, the number of cycles decreased from 61,918,898 to

31,694,501 cycles, however, because the DSP56800E processor runs at a higher clock frequency, the

actual time is much shorter. This corresponds to a decrease from 6.73 MCPS to 3.43 MCPS in the

processing load.

Additional speed improvement can be achieved by performing methods to optimize the ported code, by

making use of the new DSP56800E features. Most of these methods can be done easily without a deep

understanding of the algorithm and the overall code. The new features introduced by DSP56800E, which

are most useful in this process, are additional registers, the extended set of data ALU operations, increased

flexibility of the instruction set, AGU arithmetic, and hardware support for nested looping. In the example

presented in this note, all of these features were used in different selected functions. The overall processing

load improvement was from 3.43 MCPS to 3.13 MCPS—that is, about 10 percent. Achieving this

improvement is realistic for general applications. The code of the optimized version was slightly smaller

(about 2 percent). However, these methods of optimizing preserved the original code structure, as designed

for DSP56800. If code is written from scratch, designed directly for DSP56800E, some of the new features

can be exploited on a larger scale (for example, extended register set, more flexibility of the instruction set,

new data types and AGU arithmetic). On selected examples, total improvements between 22 percent and

30 percent less cycles were obtained.

Regarding the new pipeline structure, the original DSP56800 code runs directly, giving correct results.

However, there are situations when code that did not violate pipeline restrictions on DSP56800 creates

dependencies on DSP56800E. The core resolves these dependencies by introducing stalls (as in the case of

data ALU dependencies). If the assembler signals these situations, the programmer can rearrange the code

and eliminate the stalls, increasing speed even more.

In certain cases it might be necessary to extend the application making full use of DSP56800E addressing

capabilities. The process of extending a ported application beyond the 16-bit boundary for program and

data was analyzed. Usually this is not a straightforward process. However, if a new DSP56800E

application is designed from scratch for this purpose, there are absolutely no problems in using the whole

addressing space.

This application note proved that using of the new DSP56800E in existing DSP56800 applications is quite

direct and brings performance improvements. These applications run in half the number of cycles

In summary, the following rules of thumb are presented:

• Unmodified DSP56800 code ported to DSP56800E generally takes half the

number of clock cycles.

• Modification can further improve performance:

– Local optimizations result in 10 percent clock cycle improvement.

– Code rewrite may result in 20-30 percent clock cycle improvement.

eescale S

emiconduct

, I

Freescale Semiconductor, Inc.

For More Information On This Product,

Go to: www.freescale.com

nc...