Technical information

Software Performance Optimization Methods

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 12

code from standard C or C++. However, in practice, this optimization level cannot always

produce binaries faster than -O2. Check the software performance case-by-case.

• -Os. Selects optimizations that attempt to minimize the size of the image, even at the

expense of speed. (This is not a point of focus in this document.)

• -Ofast. Disregards strict standards compliance. '-Ofast' enables all '-O3'

optimizations. It also enables optimizations that are not valid for all standard compliant

programs. It turns on '-ffast-math'.

In addition to the optimization levels, you must set other compiler options to tell the compiler to

generate NEON instructions:

• -std=c99. The C99 standard introduces some new features that can be used for NEON

optimization.

• -mcpu=cortex-a9. Specifies the name of the target ARM processor. GCC uses this

name to determine what kind of instructions it can issue when generating assembly code.

• -mfpu=neon. Specifies which floating-point hardware (or hardware emulation) is

available on the target. Because the Zynq-7000 device has an integrated NEON hardware

unit, and because you plan to use it to accelerate software, you must specify your intention

to the compiler clearly, using the name 'neon'.

• -ftree-vectorize. Performs loop vectorization on trees. By default, this is enabled at

'-O3'.

• -mvectorize-with-neon-quad. By default, GCC 4.4 vectorizes for double-word only.

In most cases, using quad-word can better code performance and density, at the cost of

smaller numbers of usable registers.

• -mfloat-abi=name. Specifies which floating-point ABI is used. Permitted values are:

'soft', 'softfp', and 'hard'.

• 'soft' causes the GCC to generate output containing library calls for floating-point

operations. This is used when there are no hardware floating-point units in the system.

• 'softfp' allows the generation of instructions using a hardware floating-point unit,

but still uses the soft-float calling conventions. This results in better compatibility.

• 'hard' allows generation of floating-point instructions and uses FPU-specific calling

conventions. If using the option 'hard', you must compile and link the entire source

code with the same setting.

• -ffast-math. This option is not turned on by any '-O' option except

'-Ofast'because it can result in incorrect output for programs that depend on an exact

implementation of IEEE or ISO rules/specifications for math functions. It might, however,

yield faster code for programs that do not require the guarantees of these specifications.

In practice, you can set the optimization level to '-O2' or '-O3' and use the options in the

other optimization flags field, as shown in Figure 4. In Xilinx SDK, you can right-click the

project and click C/C++ Build Settings > Optimization to display the optimization-related

fields.

-mcpu=cortex-a9 -mfpu=neon -ftree-vectorize -mvectorize-with-neon-quad

-mfloat-abi=softfp -ffast-math