Technical information

Extending Program Memory Size From 64K to 2M
Conclusions 27
7 Conclusions
This application note investigated the process of porting an application developed for DSP56800 to the
DSP56800E and the methods to optimize the ported code using the new features of DSP56800E. Also, the
methods to optimize selected ported functions were analyzed and compared to redesigning and rewriting
the functions.
Porting an existing DSP56800 code to DSP56800E is almost a direct process because the assembly code is
compatible. There are certain requirements the code must meet to comply, but in normal applications they
are not usually an issue. The only exception is the MAC Output Limiter, but that can also be corrected.
The ported code runs on DSP56800E in almost half the number of cycles and uses nearly the same
program size. Some pipeline effects can occur on the ported code, however these do not influence the
correctness of the results. Only the execution time (in cycles) is slightly longer than half the number of
cycles of the DSP56800 original code. Also, the program memory size of the ported application is slightly
larger than the original. For the selected application, the number of cycles decreased from 61,918,898 to
31,694,501 cycles, however, because the DSP56800E processor runs at a higher clock frequency, the
actual time is much shorter. This corresponds to a decrease from 6.73 MCPS to 3.43 MCPS in the
processing load.
Additional speed improvement can be achieved by performing methods to optimize the ported code, by
making use of the new DSP56800E features. Most of these methods can be done easily without a deep
understanding of the algorithm and the overall code. The new features introduced by DSP56800E, which
are most useful in this process, are additional registers, the extended set of data ALU operations, increased
flexibility of the instruction set, AGU arithmetic, and hardware support for nested looping. In the example
presented in this note, all of these features were used in different selected functions. The overall processing
load improvement was from 3.43 MCPS to 3.13 MCPSthat is, about 10 percent. Achieving this
improvement is realistic for general applications. The code of the optimized version was slightly smaller
(about 2 percent). However, these methods of optimizing preserved the original code structure, as designed
for DSP56800. If code is written from scratch, designed directly for DSP56800E, some of the new features
can be exploited on a larger scale (for example, extended register set, more flexibility of the instruction set,
new data types and AGU arithmetic). On selected examples, total improvements between 22 percent and
30 percent less cycles were obtained.
Regarding the new pipeline structure, the original DSP56800 code runs directly, giving correct results.
However, there are situations when code that did not violate pipeline restrictions on DSP56800 creates
dependencies on DSP56800E. The core resolves these dependencies by introducing stalls (as in the case of
data ALU dependencies). If the assembler signals these situations, the programmer can rearrange the code
and eliminate the stalls, increasing speed even more.
In certain cases it might be necessary to extend the application making full use of DSP56800E addressing
capabilities. The process of extending a ported application beyond the 16-bit boundary for program and
data was analyzed. Usually this is not a straightforward process. However, if a new DSP56800E
application is designed from scratch for this purpose, there are absolutely no problems in using the whole
addressing space.
This application note proved that using of the new DSP56800E in existing DSP56800 applications is quite
direct and brings performance improvements. These applications run in half the number of cycles
In summary, the following rules of thumb are presented:
Unmodified DSP56800 code ported to DSP56800E generally takes half the
number of clock cycles.
Modification can further improve performance:
Local optimizations result in 10 percent clock cycle improvement.
Code rewrite may result in 20-30 percent clock cycle improvement.
Fr
eescale S
emiconduct
or
, I
Freescale Semiconductor, Inc.
For More Information On This Product,
Go to: www.freescale.com
nc...