Datasheet
Rev. I | Page 4 of 64 | August 2013
ADSP-BF531/ADSP-BF532/ADSP-BF533
BLACKFIN PROCESSOR CORE
As shown in Figure 2 on Page 5, the Blackfin processor core 
contains two 16-bit multipliers, two 40-bit accumulators, two 
40-bit ALUs, four video ALUs, and a 40-bit shifter. The compu-
tation units process 8-bit, 16-bit, or 32-bit data from the 
register file.
The compute register file contains eight 32-bit registers. When 
performing compute operations on 16-bit operand data, the 
register file operates as 16 independent 16-bit registers. All 
operands for compute operations come from the multiported 
register file and instruction constant fields.
Each MAC can perform a 16-bit by 16-bit multiply in each 
cycle, accumulating the results into the 40-bit accumulators. 
Signed and unsigned formats, rounding, and saturation are 
supported.
The ALUs perform a traditional set of arithmetic and logical 
operations on 16-bit or 32-bit data. In addition, many special 
instructions are included to accelerate various signal processing 
tasks. These include bit operations such as field extract and 
population count, modulo 2
32
 multiply, divide primitives, satu-
ration and rounding, and sign/exponent detection. The set of 
video instructions includes byte alignment and packing opera-
tions, 16-bit and 8-bit adds with clipping, 8-bit average 
operations, and 8-bit subtract/absolute value/accumulate (SAA) 
operations. Also provided are the compare/select and vector 
search instructions.
For certain instructions, two 16-bit ALU operations can be per-
formed simultaneously on register pairs (a 16-bit high half and 
16-bit low half of a compute register). Quad 16-bit operations 
are possible using the second ALU.
The 40-bit shifter can perform shifts and rotates and is used to 
support normalization, field extract, and field deposit 
instructions.
The program sequencer controls the flow of instruction execu-
tion, including instruction alignment and decoding. For 
program flow control, the sequencer supports PC relative and 
indirect conditional jumps (with static branch prediction), and 
subroutine calls. Hardware is provided to support zero-over-
head looping. The architecture is fully interlocked, meaning that 
the programmer need not manage the pipeline when executing 
instructions with data dependencies.
The address arithmetic unit provides two addresses for simulta-
neous dual fetches from memory. It contains a multiported 
register file consisting of four sets of 32-bit index, modify, 
length, and base registers (for circular buffering), and eight 
additional 32-bit pointer registers (for C-style indexed stack 
manipulation).
Blackfin processors support a modified Harvard architecture in 
combination with a hierarchical memory structure. Level 1 (L1) 
memories are those that typically operate at the full processor 
speed with little or no latency. At the L1 level, the instruction 
memory holds instructions only. The two data memories hold 
data, and a dedicated scratchpad data memory stores stack and 
local variable information.
In addition, multiple L1 memory blocks are provided, offering a 
configurable mix of SRAM and cache. The memory manage-
ment unit (MMU) provides memory protection for individual 
tasks that may be operating on the core and can protect system 
registers from unintended access.
The architecture provides three modes of operation: user mode, 
supervisor mode, and emulation mode. User mode has 
restricted access to certain system resources, thus providing a 
protected software environment, while supervisor mode has 
unrestricted access to the system and core resources.
The Blackfin processor instruction set has been optimized so 
that 16-bit opcodes represent the most frequently used instruc-
tions, resulting in excellent compiled code density. Complex 
DSP instructions are encoded into 32-bit opcodes, representing 
fully featured multifunction instructions. Blackfin processors 
support a limited multi-issue capability, where a 32-bit instruc-
tion can be issued in parallel with two 16-bit instructions, 
allowing the programmer to use many of the core resources in a 
single instruction cycle.
The Blackfin processor assembly language uses an algebraic syn-
tax for ease of coding and readability. The architecture has been 
optimized for use in conjunction with the C/C++ compiler, 
resulting in fast and efficient software implementations.
MEMORY ARCHITECTURE
The ADSP-BF531/ADSP-BF532/ADSP-BF533 processors view 
memory as a single unified 4G byte address space, using 32-bit 
addresses. All resources, including internal memory, external 
memory, and I/O control registers, occupy separate sections of 
this common address space. The memory portions of this 
address space are arranged in a hierarchical structure to provide 
a good cost/performance balance of some very fast, low latency 
on-chip memory as cache or SRAM, and larger, lower cost and 
performance off-chip memory systems. See Figure 3, Figure 4, 
and Figure 5 on Page 6.
The L1 memory system is the primary highest performance 
memory available to the Blackfin processor. The off-chip mem-
ory system, accessed through the external bus interface unit 
(EBIU), provides expansion with SDRAM, flash memory, and 
SRAM, optionally accessing up to 132M bytes of 
physical memory.
The memory DMA controller provides high bandwidth data-
movement capability. It can perform block transfers of code or 
data between the internal memory and the external 
memory spaces.
Internal (On-Chip) Memory
The processors have three blocks of on-chip memory that pro-
vide high bandwidth access to the core. 
The first block is the L1 instruction memory, consisting of up to 
80K bytes SRAM, of which 16K bytes can be configured as a 
four way set-associative cache. This memory is accessed at full 
processor speed.










