Technical information

Boost NEON Performance by Improving Memory Access Efficiency

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 25

In Figure 6, you can see the 16-byte arrays with starting addresses 0x0, 0x40 and 0x80 share

the same cache line. This means that at any given time, only one of those lines can be in the

cache.

Consider a loop similar to the following, with pointers result, data1, and data2 pointing to

0x00, 0x40 and 0x80 respectively.

void add_array(int *data1, int *data2, int *result, int size)

{

int i;

for (i=0 ; i<size ; i++) {

result[i] = data1[i] + data2[i];

}

When the code starts running, something similar to the following occurs:

•Address 0x40 is read first. As it is not in cache, a line-fill takes place by putting the data

from 0x40 to 0x4F into the cache.

• Then, address 0x80 is read. It is not in cache and so a line-fill takes place by putting the

data from 0x80 to 0x8F into the cache and evicting the data from addresses 0x40 to

0x4F out of the cache.

• The result is written to 0x00. Depending on the cache allocation policy, this might cause

another line fill. The data from 0x80 to 0x8F might be evicted out of the cache.

• The same thing can happen again and again on each iteration of the loop. You can see

that the cache content re-use is almost nothing and the software performance could be

very poor.

This issue is called cache thrashing. It is very easy for cache thrashing to occur on

direct-mapped cache, so it is seldom used in real designs.

X-Ref Target - Figure 6

Figure 6: Direct Mapped Cache

'DWD +LW

ELW$GGUHVV

/LQHV,QGH[

7DJ

,QGH[2IIVHW%\WH

[

[

[

[

[

[

[

[

[

[

0DLQ0HPRU\ &DFKH

;