Technical information
Before You Begin: Important Concepts
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 8
Cortex-A9 processor. For floating-point operation, VFP can support both single-precision and
double-precision, whereas NEON only supports single-precision. VFP can also support more
complex functions, such as square roots, division, and others, but NEON cannot.
NEON and VFP share the thirty-two 64-bit registers in hardware. This means that VFP is
present in VFPv3-D32 form, which has 32 double-precision floating-point registers. This makes
support for context switching simpler. Code that saves and restores VFP contexts also saves
and restores NEON contexts.
Data Types
Data type specifiers in NEON instructions consist of a letter that indicates the type of data and
a number that indicates the width. They are separated from the instruction mnemonic by a
point. The following options are available:
• Unsigned integer U8 U16 U32 U64
• Signed integer S8 S16 S32 S64
• Integer of unspecified type I8 I16 I32 I64
• Floating-point number F16 F32
• Polynomial over {0,1} P8
NEON Instruction
All mnemonics for NEON instructions (as with VFP) begin with the letter V. This distinguishes
them from ARM/Thumb instructions. You can use this indicator to find NEON instructions in
disassembly code when checking the efficiency of a compiler. The example below shows the
general format of NEON instructions:
V{<mod>}<op>{<shape>}{<cond>}{.<dt>}(<dest>}, src1, src2
where:
For a detailed description of each NEON instruction, refer to the NEON Programmers Guide
[Ref 3]. It is strongly recommended that you understand the NEON instruction set because:
• NEON instructions should be used to the maximum extent when designing algorithms.
Emulating functions with instruction sequences can lower performance significantly.
• Compilers might not be able to generate optimal code, so you might have to read the
disassembly and determine whether or not the generated code is optimal.
• Sometimes it is difficult to express certain operations, such as saturation mathematics,
interleaved memory access, table lookup, bitwise multiplex operations, and others, with C
language. So, you might have to use intrinsics or assembler code for these instructions.
• For time-critical applications, you might have to write NEON assembler code to realize the
best performance.
For more information, see the ARM Architecture Reference Manual [Ref 2] and the Cortex-A
Series Programmer's Guide [Ref 7].
<mod> One of the previously described modifiers (Q, H, D, R)
<op> Operation (for example, ADD, SUB, MUL)
<shape> Shape (L, W or N) [Ref 4]
<cond> Condition, used with IT instruction
<.dt> Data type
<dest> Destination
<src1> Source operand 1
<src2> Source operand 2