Technical information

Summary
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 27
ii < 8; ii++, rresult += N, ra += N)
for (ki = 0, rb = &b[ko][jo];ki < 8; ki++, rb += N)
for (ji = 0; ji < 8; ji++)
rresult[ji] += ra[ki] * rb[ji];
Lab 3
1. Create a new project, import lab 3 source files.
2. Run the standalone application on board.
3. In the code provided, set the matrix size to 512*512, so that the execution does not take too
long.
Note:
Because the focus is on how tiling can improve performance, you do not enable compiler
automatic vectorization for NEON in this lab.
The optimization level is set to -O2, and the optimization flags are set to -std=c99.
4. After executing lab3, you can see that:
- The non-tiling implementation takes approximately 7.9 seconds.
- The tiling implementation only takes about 2.1 seconds.
- The performance improvement is significant.
Because NEON is often used to process large data sets, properly changing an algorithm with
the tiling technique can produce higher cache hit rates, thus much better performance. You can
also try using compiler automatic vectorization in your projects to achieve additional (more
modest) improvements. As demonstrated in lab1, compilers are not good at automatic
vectorization on complex loops. If more performance gain is needed, you must perform manual
optimization on computations within tiles.
Summary This application note introduces four methods for improving software performance using NEON
on a Cortex-A9 processor core. Because NEON is typically used on large data sets, cache
performance is critical to system performance. Also discussed are three ways to improve data
exchanges between the CPU, cache, and main memory. Software optimization is a complex
topic. To realize optimal performance from hardware, you must apply all these techniques
together and properly balance them.
References This application note uses the following references:
1. Zynq-7000 All Programmable SoC: Concepts, Tools, and Techniques (UG873)
www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/ug873-zynq-ctt.pdf
2. ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition
silver.arm.com/download/download.tm?pv=1299246
3. NEON Programmer’s Guide
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0018a/index.html
4. Cortex™-A9 NEON™ Media Processing Engine Technical Reference Manual
infocenter.arm.com/help/topic/com.arm.doc.ddi0409g/DDI0409G_cortex_a9_neon_mpe_
r3p0_trm.pdf
5. Cortex™-A9 Technical Reference Manual
infocenter.arm.com/help/topic/com.arm.doc.ddi0388g/DDI0388G_cortex_a9_r3p0_trm.pd
f
6. Cortex™-A9 MPCore® Technical Reference Manual
infocenter.arm.com/help/topic/com.arm.doc.ddi0407g/DDI0407G_cortex_a9_mpcore_r3p
0_trm.pdf