Technical information
Software Performance Optimization Methods
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 17
Tab l e 6 can give developers some basic ideas about NEON types.
There are also combination types, which include two, three, or four of each of the above in a
larger ‘struct’ type. These types are used to map the registers accessed by NEON
load/store operations, which can load/store up to four registers in a single instruction. For
example:
struct int16x4x2_t
{
int16x4_t val[2];
}<var_name>;
These types are only used by loads, stores, transpose, interleave, and de-interleave
instructions. To perform operations on the actual data, select the individual registers using the
syntax shown below:
<var_name>.val[0] and <var_name>.val[1]
Techniques Specific to NEON Intrinsics
Declaring a Variable
Example:
uint32x2_t vec64a, vec64b; // create two D-register variables
Using Constants
The following code replicates a constant into each element of a vector:
uint8x8 start_value = vdup_n_u8(0);
To load a general 64-bit constant into a vector:
uint8x8 start_value =
vreinterpret_u8_u64(vcreate_u64(0x123456789ABCDEFULL));
Moving Results Back to Normal C Variables
To access a result from a NEON register, you can store it to memory using VST or move it back
to ARM using a get lane type operation:
result = vget_lane_u32(vec64a, 0); // extract lane 0
Table 6: NEON Type Definitions
64-bit type (D-register) 128-bit type (Q-register)
int8x8_t int8x16_t
int16x4_t int16x8_t
int32x2_t int32x4_t
int64x1_t int64x2_t
uint8x8_t uint8x16_t
uint16x4_t uint16x8_t
uint32x2_t uint32x4_t
uint64x1_t uint64x2_t
float16x4_t float16x8_t
float32x2_t float32x4_t
poly8x8_t poly8x16_t
poly16x4_t poly16x8_t