Technical information

Software Performance Optimization Methods
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 12
code from standard C or C++. However, in practice, this optimization level cannot always
produce binaries faster than -O2. Check the software performance case-by-case.
-Os. Selects optimizations that attempt to minimize the size of the image, even at the
expense of speed. (This is not a point of focus in this document.)
-Ofast. Disregards strict standards compliance. '-Ofast' enables all '-O3'
optimizations. It also enables optimizations that are not valid for all standard compliant
programs. It turns on '-ffast-math'.
In addition to the optimization levels, you must set other compiler options to tell the compiler to
generate NEON instructions:
-std=c99. The C99 standard introduces some new features that can be used for NEON
optimization.
-mcpu=cortex-a9. Specifies the name of the target ARM processor. GCC uses this
name to determine what kind of instructions it can issue when generating assembly code.
-mfpu=neon. Specifies which floating-point hardware (or hardware emulation) is
available on the target. Because the Zynq-7000 device has an integrated NEON hardware
unit, and because you plan to use it to accelerate software, you must specify your intention
to the compiler clearly, using the name 'neon'.
-ftree-vectorize. Performs loop vectorization on trees. By default, this is enabled at
'-O3'.
-mvectorize-with-neon-quad. By default, GCC 4.4 vectorizes for double-word only.
In most cases, using quad-word can better code performance and density, at the cost of
smaller numbers of usable registers.
-mfloat-abi=name. Specifies which floating-point ABI is used. Permitted values are:
'soft', 'softfp', and 'hard'.
'soft' causes the GCC to generate output containing library calls for floating-point
operations. This is used when there are no hardware floating-point units in the system.
'softfp' allows the generation of instructions using a hardware floating-point unit,
but still uses the soft-float calling conventions. This results in better compatibility.
'hard' allows generation of floating-point instructions and uses FPU-specific calling
conventions. If using the option 'hard', you must compile and link the entire source
code with the same setting.
-ffast-math. This option is not turned on by any '-O' option except
'-Ofast'because it can result in incorrect output for programs that depend on an exact
implementation of IEEE or ISO rules/specifications for math functions. It might, however,
yield faster code for programs that do not require the guarantees of these specifications.
In practice, you can set the optimization level to '-O2' or '-O3' and use the options in the
other optimization flags field, as shown in Figure 4. In Xilinx SDK, you can right-click the
project and click C/C++ Build Settings > Optimization to display the optimization-related
fields.
-mcpu=cortex-a9 -mfpu=neon -ftree-vectorize -mvectorize-with-neon-quad
-mfloat-abi=softfp -ffast-math