Technical information

Before You Begin: Important Concepts

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 8

Cortex-A9 processor. For floating-point operation, VFP can support both single-precision and

double-precision, whereas NEON only supports single-precision. VFP can also support more

complex functions, such as square roots, division, and others, but NEON cannot.

NEON and VFP share the thirty-two 64-bit registers in hardware. This means that VFP is

present in VFPv3-D32 form, which has 32 double-precision floating-point registers. This makes

support for context switching simpler. Code that saves and restores VFP contexts also saves

and restores NEON contexts.

Data Types

Data type specifiers in NEON instructions consist of a letter that indicates the type of data and

a number that indicates the width. They are separated from the instruction mnemonic by a

point. The following options are available:

• Unsigned integer U8 U16 U32 U64

• Signed integer S8 S16 S32 S64

• Integer of unspecified type I8 I16 I32 I64

• Floating-point number F16 F32

• Polynomial over {0,1} P8

NEON Instruction

All mnemonics for NEON instructions (as with VFP) begin with the letter V. This distinguishes

them from ARM/Thumb instructions. You can use this indicator to find NEON instructions in

disassembly code when checking the efficiency of a compiler. The example below shows the

general format of NEON instructions:

V{<mod>}<op>{<shape>}{<cond>}{.<dt>}(<dest>}, src1, src2

where:

For a detailed description of each NEON instruction, refer to the NEON Programmers Guide

[Ref 3]. It is strongly recommended that you understand the NEON instruction set because:

• NEON instructions should be used to the maximum extent when designing algorithms.

Emulating functions with instruction sequences can lower performance significantly.

• Compilers might not be able to generate optimal code, so you might have to read the

disassembly and determine whether or not the generated code is optimal.

• Sometimes it is difficult to express certain operations, such as saturation mathematics,

interleaved memory access, table lookup, bitwise multiplex operations, and others, with C

language. So, you might have to use intrinsics or assembler code for these instructions.

• For time-critical applications, you might have to write NEON assembler code to realize the

best performance.

For more information, see the ARM Architecture Reference Manual [Ref 2] and the Cortex-A

Series Programmer's Guide [Ref 7].

<mod> One of the previously described modifiers (Q, H, D, R)

<op> Operation (for example, ADD, SUB, MUL)

<shape> Shape (L, W or N) [Ref 4]

<cond> Condition, used with IT instruction

<.dt> Data type

<dest> Destination

<src1> Source operand 1

<src2> Source operand 2