User manual

Tutorial: Handel-C code optimization
The behaviour and timing of the code is as follows:
After the first clock cycle:
new values for the additions are calculated and stored in
sum1 and sum2.
the value in
a will be undefined, as it depends on sum1 and sum2 for its inputs, and they
were undefined at the start of the cycle.
After the second clock cycle:
another set of new values for the additions are calculated and stored in
sum1 and sum2.
The multiplication has been performed, using the values of
sum1 and sum2 generated
in the previous clock cycle, and the result is stored in
a.
The behaviour in the second cycle is then repeated in all following cycles, providing that the data in
b, c,
d and e is valid on every cycle.
The result is that the block of code will be capable of running at a higher clock rate when implemented in
hardware, at the expense of results being delayed by one cycle. As long as new inputs are presented
every cycle, there will be a new result every cycle, after the initial one cycle delay. Hence the pipeline
has a latency of one cycle and a throughput of one result per cycle.
7.2 Pipelines and replicators
Parallel and sequential replicators can be used in Handel-C to build complex program structures quickly
and allow them to be parameterized. Replicators are used in the same way as
for() loops, except that
during compilation they are expanded so that all iterations are implemented individually. They can be
executed sequentially or in parallel. So, the following code:
par (i=0; i<3; i++)
{
a[i] = b[i];
}
expands to:
par
{
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
}
If a
seq had been used instead of the par, the expanded code would have been executed sequentially
instead of in parallel.
Replicators are useful for implementing algorithms which access iterate over an array or bitwise across
several variables. A good example is a pipelined multiplier where the number of pipeline stages is equal
to the width. The input data and a sum are passed through each stage, the inputs being shifted and
added to the sum as required. The code below implements a pipelined multiplier with a user-defined
data width.
www.celoxica.com
Page 79