Technical information

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 1
© Copyright 2014 Xilinx, Inc. Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the
United States and other countries. AMBA, AMBA Designer, ARM, ARM1176JZ-S, CoreSight, Cortex, and PrimeCell are trademarks of ARM in the EU and other countries. All
other trademarks are the property of their respective owners.
Summary Xilinx
®
Zynq
®
-7000 All Programmable SoC is an architecture that integrates a dual-core ARM
®
Cortex™-A9 processor, which is widely used in embedded products. Both ARM Cortex-A9
cores have an advanced single instruction, multiple data (SIMD) engine, also known as NEON.
It is specialized for parallel data computation on large data sets. This document explains how to
use NEON to improve software performance and cache efficiency, thus improving NEON
performance generally.
Introduction Generally speaking, a CPU executes instructions and processes data one-by-one. Typically,
high performance is achieved using high clock frequencies, but semiconductor technology
imposes limits on this. Parallel computation is the next strategy typically employed to improve
CPU data processing capability. The SIMD technique allows multiple data to be processed in
one or just a few CPU cycles. NEON is the SIMD implementation in ARM v7A processors.
Effective use of NEON can produce significant software performance improvements.
Document Content Overview
Technical information provided in the this document includes:
Before You Begin: Important Concepts
This document provides the following information you need to be more effective when
optimizing your code:
Software Optimization Basics
NEON Basics
Software Performance Optimization Methods
This document describes four ways to optimize software performance with NEON :
Using Using NEON Optimized Libraries
As Cortex-A9 prevails in embedded designs, many software libraries are optimized for
NEON and have performance improvements. This document lists those libraries which are
frequently used by the community.
Using Compiler Automatic Vectorization
GCC, the popular open source compiler, can generate NEON instructions with proper
compilation options. However, the C language does not excel at expressing parallel
computations. You might need to modify your C code to add compiler hints. Lab 1 provides
a hands-on example.
Using NEON Intrinsics
Usually, the compiler handles simple optimizations well (optimizations such as register
allocation, instruction scheduling, etc.). However, you might need to use NEON intrinsics
when the compiler fails to analyze and optimize more complex algorithms. Moreover, some
NEON instructions have no equivalent C expressions, and intrinsics or assembly are the
Application Note: Zynq-7000 AP SoC
XAPP1206 v1.1 June 12, 2014
Boost Software Performance on Zynq-7000
AP SoC with NEON
Author: Haoliang Qin

Summary of content (28 pages)