Technical information
Before You Begin: Important Concepts
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 6
cores, and one private timer (32-bit) for each core. The timers have a pre-scaler to lower the
timer clock rate and to gain longer overflow time, at the cost of timer resolution. The actual timer
clock can be calculated with the following equation:
On the Zynq-7000 AP SoC, the PERIPHCLK is half of CPU clock frequency. This means that
the highest timer resolution is 2 nanoseconds. To make software development easier, Xilinx
also provides APIs for these timers.
NEON Basics
This section examines why and how NEON can be used to improve software performance.
From a software perspective, NEON technology is based on single instruction, multiple data
(SIMD) operations in ARMv7 processors, which implement the advanced SIMD architecture
extensions. This demands a new set of instructions with new functions, and also a new
development methodology. From a hardware perspective, NEON is a separate hardware unit
on Cortex-A series processors, together with a vector floating point (VFP) unit. If an algorithm
can be designed to exploit dedicated hardware, performance can be maximized.
SIMD Introduction
SIMD is a computational technique for processing many data values (generally in powers of
two) in parallel using a single instruction, with the data for the operands packed into special,
wide registers.Therefore, one instruction can do the work of many separate instructions on
single instruction, single data (SISD) architectures. For code that can be parallelized, large
performance improvements can be achieved.
Many software programs operate on large data sets. Each element in a data set can be less
than 32 bits. 8-bit data is common in video, graphics, and image processing, and 16-bit data in
audio codecs. In these contexts, the operations to be performed are simple, repeated many
times, and have little need for control code. SIMD can offer considerable performance
improvements for this type of data processing. It is particularly useful for digital signal
processing or multimedia algorithms, such as:
• Block-based data processing, such as FFTs, matrix multiplication, etc.
• Audio, video, and image processing codecs, such as MPEG-4, H.264, On2 VP6/7/8, AVS,
etc.
• 2D graphics based on rectangular blocks of pixels
• 3D graphics
• Color-space conversion
• Physics simulations
• Error correction, such as Reed Solomon codecs, CRCs, elliptic curve cryptography, etc.
On 32-bit microprocessors, such as the Cortex-A series processors, it is relatively inefficient to
run large numbers of 8-bit or 16-bit operations. The processor ALU, registers, and datapath are
designed for 32-bit calculations. If they are used for 8/16-bit operations, additional instructions
are needed to handle overflow. SIMD enables a single instruction to treat a register value as
multiple data elements and to perform multiple, identical operations on those elements.
PERIPHCLK
PRESCALERvalue 1+
---------------------------------------------------------------- -