User guide

114 www.xilinx.com System Generator for DSP User Guide

UG640 (v 12.2) July 23, 2010

Chapter 1: Hardware Design Using System Generator

Browse Through and Understand the Xilinx Filter Block

The following block diagram showing how the MAC-based FIR filter has been

implemented for this tutorial.

At this point, the MAC filter is set up for a 10-bit signed input data (Fix_10_8), a 12-bit

signed coefficient (Fix_12_12), and 43 taps. All these parameters can be modified directly

from the MAC block GUI. The coefficients and data need to be stored in a memory system.

For the tutorial, you choose to use a dual-port memory to store the data and coefficients,

with the data being captured and read out using a circular RAM buffer. The RAM is used

in a mixed-mode configuration: values are written and read from port A (RAM mode), and

the coefficients are only read from port B (ROM mode).

The multiplier is set up to use the embedded multiplier resource available in Xilinx

Virtex® devices as well as three levels of latency in order to achieve the fastest performance

possible. The precision required for the multiplier and the accumulator is a function of the

filter taps (coefficients) and the number of taps. Since these are fixed at design time, it is

possible to tailor the hardware resources to the filter specification. The accumulator need

only have sufficient precision to accumulate maximal input against the filter taps, which is

calculated as follows:

acc_nbits = ceil(log2(sum(abs(coef*2^coef_width_bp)))) + data_width+ 1;

Upon reset, the accumulator re-initializes to its current input value rather than zero, which

allows the MAC engine to stream data without stalling. A capture register is required for

streaming operation since the MAC engine reloads its accumulator with an incoming

sample after computing the last partial product for an output sample.

Finally, a downsampler reduces the capture register sample period to the output sample

period. The block is configured with latency to obtain the most efficient hardware

implementation. The downsampling rate is equal to the coefficient array length.