Technical information

Before You Begin: Important Concepts

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 6

cores, and one private timer (32-bit) for each core. The timers have a pre-scaler to lower the

timer clock rate and to gain longer overflow time, at the cost of timer resolution. The actual timer

clock can be calculated with the following equation:

On the Zynq-7000 AP SoC, the PERIPHCLK is half of CPU clock frequency. This means that

the highest timer resolution is 2 nanoseconds. To make software development easier, Xilinx

also provides APIs for these timers.

NEON Basics

This section examines why and how NEON can be used to improve software performance.

From a software perspective, NEON technology is based on single instruction, multiple data

(SIMD) operations in ARMv7 processors, which implement the advanced SIMD architecture

extensions. This demands a new set of instructions with new functions, and also a new

development methodology. From a hardware perspective, NEON is a separate hardware unit

on Cortex-A series processors, together with a vector floating point (VFP) unit. If an algorithm

can be designed to exploit dedicated hardware, performance can be maximized.

SIMD Introduction

SIMD is a computational technique for processing many data values (generally in powers of

two) in parallel using a single instruction, with the data for the operands packed into special,

wide registers.Therefore, one instruction can do the work of many separate instructions on

single instruction, single data (SISD) architectures. For code that can be parallelized, large

performance improvements can be achieved.

Many software programs operate on large data sets. Each element in a data set can be less

than 32 bits. 8-bit data is common in video, graphics, and image processing, and 16-bit data in

audio codecs. In these contexts, the operations to be performed are simple, repeated many

times, and have little need for control code. SIMD can offer considerable performance

improvements for this type of data processing. It is particularly useful for digital signal

processing or multimedia algorithms, such as:

• Block-based data processing, such as FFTs, matrix multiplication, etc.

• Audio, video, and image processing codecs, such as MPEG-4, H.264, On2 VP6/7/8, AVS,

etc.

• 2D graphics based on rectangular blocks of pixels

• 3D graphics

• Color-space conversion

• Physics simulations

• Error correction, such as Reed Solomon codecs, CRCs, elliptic curve cryptography, etc.

On 32-bit microprocessors, such as the Cortex-A series processors, it is relatively inefficient to

run large numbers of 8-bit or 16-bit operations. The processor ALU, registers, and datapath are

designed for 32-bit calculations. If they are used for 8/16-bit operations, additional instructions

are needed to handle overflow. SIMD enables a single instruction to treat a register value as

multiple data elements and to perform multiple, identical operations on those elements.

PERIPHCLK

PRESCALERvalue 1+

---------------------------------------------------------------- -