Technical information
Software Performance Optimization Methods
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 13
Figure 4 shows the setting optimization flags.
The compiler might not always vectorize C language code as expected, so you must ensure
that compilers generate appropriate instructions:
• Read the disassembly. This is the most straightforward method, but it requires a full
understanding of NEON instructions.
• Use the compiler option -ftree-vectorizer-verbose=n. This option controls the
amount of debugging output the vectorizer prints. This information is written to standard
error, unless '-fdump-tree-all' or '-fdump-tree-vect' are specified, in which
case it is output to the usual dump listing file, .vect. For n=0, no diagnostic information is
reported. For n=9, all the information the vectorizer generated during its analysis and
transformation is reported. This is the same verbosity level that
'-fdump-tree-vect-details' uses.
C Code Modifications
Because the C and C++ standards do not provide syntax that specifies parallel behavior, it is
difficult for compilers to determine when it is safe to generate NEON code. Without enough
proof, compilers do not vectorize the code, so you must modify code to provide additional hints
to the compiler. Such source code modifications are within the standard language
specifications, so they do not affect code portability across platforms and architectures.
The following are recommended techniques for modifying code for NEON:
• Indicate the number of loop iterations
• Avoid loop-carried dependencies
• Avoid conditions inside loops
• Use the restrict keyword
X-Ref Target - Figure 4
Figure 4: Setting Optimization Flags