Datasheet

ManualsBrandsANALOG DEVICES ManualsMicrocontrollers, Microprocessors & QuartzesDigital signal processor (DSP) CSPBGA 160 (12x12) 1.2 V 533 MHz

Rev. I | Page 4 of 64 | August 2013

ADSP-BF531/ADSP-BF532/ADSP-BF533

BLACKFIN PROCESSOR CORE

As shown in Figure 2 on Page 5, the Blackfin processor core

contains two 16-bit multipliers, two 40-bit accumulators, two

40-bit ALUs, four video ALUs, and a 40-bit shifter. The compu-

tation units process 8-bit, 16-bit, or 32-bit data from the

The compute register file contains eight 32-bit registers. When

performing compute operations on 16-bit operand data, the

operands for compute operations come from the multiported

Each MAC can perform a 16-bit by 16-bit multiply in each

cycle, accumulating the results into the 40-bit accumulators.

Signed and unsigned formats, rounding, and saturation are

supported.

The ALUs perform a traditional set of arithmetic and logical

operations on 16-bit or 32-bit data. In addition, many special

instructions are included to accelerate various signal processing

tasks. These include bit operations such as field extract and

population count, modulo 2

multiply, divide primitives, satu-

ration and rounding, and sign/exponent detection. The set of

video instructions includes byte alignment and packing opera-

tions, 16-bit and 8-bit adds with clipping, 8-bit average

operations, and 8-bit subtract/absolute value/accumulate (SAA)

operations. Also provided are the compare/select and vector

search instructions.

For certain instructions, two 16-bit ALU operations can be per-

formed simultaneously on register pairs (a 16-bit high half and

16-bit low half of a compute register). Quad 16-bit operations

are possible using the second ALU.

The 40-bit shifter can perform shifts and rotates and is used to

support normalization, field extract, and field deposit

instructions.

The program sequencer controls the flow of instruction execu-

tion, including instruction alignment and decoding. For

program flow control, the sequencer supports PC relative and

indirect conditional jumps (with static branch prediction), and

subroutine calls. Hardware is provided to support zero-over-

head looping. The architecture is fully interlocked, meaning that

the programmer need not manage the pipeline when executing

instructions with data dependencies.

The address arithmetic unit provides two addresses for simulta-

neous dual fetches from memory. It contains a multiported

length, and base registers (for circular buffering), and eight

additional 32-bit pointer registers (for C-style indexed stack

manipulation).

Blackfin processors support a modified Harvard architecture in

combination with a hierarchical memory structure. Level 1 (L1)

memories are those that typically operate at the full processor

speed with little or no latency. At the L1 level, the instruction

memory holds instructions only. The two data memories hold

data, and a dedicated scratchpad data memory stores stack and

local variable information.

In addition, multiple L1 memory blocks are provided, offering a

configurable mix of SRAM and cache. The memory manage-

ment unit (MMU) provides memory protection for individual

tasks that may be operating on the core and can protect system

registers from unintended access.

The architecture provides three modes of operation: user mode,

supervisor mode, and emulation mode. User mode has

restricted access to certain system resources, thus providing a

protected software environment, while supervisor mode has

unrestricted access to the system and core resources.

The Blackfin processor instruction set has been optimized so

that 16-bit opcodes represent the most frequently used instruc-

tions, resulting in excellent compiled code density. Complex

DSP instructions are encoded into 32-bit opcodes, representing

fully featured multifunction instructions. Blackfin processors

support a limited multi-issue capability, where a 32-bit instruc-

tion can be issued in parallel with two 16-bit instructions,

allowing the programmer to use many of the core resources in a

single instruction cycle.

The Blackfin processor assembly language uses an algebraic syn-

tax for ease of coding and readability. The architecture has been

optimized for use in conjunction with the C/C++ compiler,

resulting in fast and efficient software implementations.

MEMORY ARCHITECTURE

The ADSP-BF531/ADSP-BF532/ADSP-BF533 processors view

memory as a single unified 4G byte address space, using 32-bit

addresses. All resources, including internal memory, external

memory, and I/O control registers, occupy separate sections of

this common address space. The memory portions of this

address space are arranged in a hierarchical structure to provide

a good cost/performance balance of some very fast, low latency

on-chip memory as cache or SRAM, and larger, lower cost and

performance off-chip memory systems. See Figure 3, Figure 4,

and Figure 5 on Page 6.

The L1 memory system is the primary highest performance

memory available to the Blackfin processor. The off-chip mem-

ory system, accessed through the external bus interface unit

(EBIU), provides expansion with SDRAM, flash memory, and

SRAM, optionally accessing up to 132M bytes of

physical memory.

The memory DMA controller provides high bandwidth data-

movement capability. It can perform block transfers of code or

data between the internal memory and the external

memory spaces.

Internal (On-Chip) Memory

The processors have three blocks of on-chip memory that pro-

vide high bandwidth access to the core.

The first block is the L1 instruction memory, consisting of up to

80K bytes SRAM, of which 16K bytes can be configured as a

four way set-associative cache. This memory is accessed at full

processor speed.