Technical information
Boost NEON Performance by Improving Memory Access Efficiency
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 25
In Figure 6, you can see the 16-byte arrays with starting addresses 0x0, 0x40 and 0x80 share
the same cache line. This means that at any given time, only one of those lines can be in the
cache.
Consider a loop similar to the following, with pointers result, data1, and data2 pointing to
0x00, 0x40 and 0x80 respectively.
void add_array(int *data1, int *data2, int *result, int size)
{
int i;
for (i=0 ; i<size ; i++) {
result[i] = data1[i] + data2[i];
}
When the code starts running, something similar to the following occurs:
•Address 0x40 is read first. As it is not in cache, a line-fill takes place by putting the data
from 0x40 to 0x4F into the cache.
• Then, address 0x80 is read. It is not in cache and so a line-fill takes place by putting the
data from 0x80 to 0x8F into the cache and evicting the data from addresses 0x40 to
0x4F out of the cache.
• The result is written to 0x00. Depending on the cache allocation policy, this might cause
another line fill. The data from 0x80 to 0x8F might be evicted out of the cache.
• The same thing can happen again and again on each iteration of the loop. You can see
that the cache content re-use is almost nothing and the software performance could be
very poor.
This issue is called cache thrashing. It is very easy for cache thrashing to occur on
direct-mapped cache, so it is seldom used in real designs.
X-Ref Target - Figure 6
Figure 6: Direct Mapped Cache
'DWD +LW
ELW$GGUHVV
/LQHV,QGH[
7DJ
,QGH[2IIVHW%\WH
[
[
[
[
[
[
[
[
[
[
0DLQ0HPRU\ &DFKH
;