Technical information

Software Performance Optimization Methods

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 17

Tab l e 6 can give developers some basic ideas about NEON types.

There are also combination types, which include two, three, or four of each of the above in a

larger ‘struct’ type. These types are used to map the registers accessed by NEON

load/store operations, which can load/store up to four registers in a single instruction. For

example:

struct int16x4x2_t

{

int16x4_t val[2];

}<var_name>;

These types are only used by loads, stores, transpose, interleave, and de-interleave

instructions. To perform operations on the actual data, select the individual registers using the

syntax shown below:

<var_name>.val[0] and <var_name>.val[1]

Techniques Specific to NEON Intrinsics

Declaring a Variable

Example:

uint32x2_t vec64a, vec64b; // create two D-register variables

Using Constants

The following code replicates a constant into each element of a vector:

uint8x8 start_value = vdup_n_u8(0);

To load a general 64-bit constant into a vector:

uint8x8 start_value =

vreinterpret_u8_u64(vcreate_u64(0x123456789ABCDEFULL));

Moving Results Back to Normal C Variables

To access a result from a NEON register, you can store it to memory using VST or move it back

to ARM using a get lane type operation:

result = vget_lane_u32(vec64a, 0); // extract lane 0

Table 6: NEON Type Definitions

64-bit type (D-register) 128-bit type (Q-register)

int8x8_t int8x16_t

int16x4_t int16x8_t

int32x2_t int32x4_t

int64x1_t int64x2_t

uint8x8_t uint8x16_t

uint16x4_t uint16x8_t

uint32x2_t uint32x4_t

uint64x1_t uint64x2_t

float16x4_t float16x8_t

float32x2_t float32x4_t

poly8x8_t poly8x16_t

poly16x4_t poly16x8_t