Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [177] MARCH 2015
backward, biprediction, and symmetric
prediction, using two reference frames.
In a B frame, in addition to the
conventional forward, backward, bi-
directional, and skip/direct prediction
modes, symmetric prediction is defined as a
special biprediction mode, wherein only
one forward motion vector (MV) is coded
and the backward MV is derived from the
forward MV. For an F frame, besides the
conventional single hypothesis prediction
mode in a P frame, multihypothesis tech-
niques are added for more efficient predic-
tion, including the advanced skip/direct
mode [8], temporal multihypothesis predic-
tion mode [9], and spatial directional multi-
hypothesis (DMH) prediction mode [10].
In an F frame, an advanced skip/direct
mode is defined using a competitive
motion derivation mechanism. Two deri-
vation methods are used: one is temporal
and the other is spatial. Temporal multihy-
pothesis mode combines two predictors
along the predefined temporal direction,
while spatial multihypothesis mode com-
bines two predictors along the predefined
spatial direction. For temporal derivation,
the prediction block is obtained by an aver-
age of the prediction blocks indicated by
the MV prediction (MVP) and the scaled
MV in a second reference. The second ref-
erence is specified by the reference index
transmitted in the bit stream. For tempo-
ral multihypothesis prediction, as shown
in Figure 4, one predictor ref_blk1 is gen-
erated with the best MV MV and a refer-
ence frame ref1 is searched by motion
estimation, and then this MV is linearly
scaled to a second reference to generate
another predictor ref_blk2. The second
reference ref 2 is specified by the reference
index transmitted in the bit stream. In
DMH mode, as specified in Figure 4, the
seed predictors are located on the line
crossing the initial predictor obtained
from motion estimation. The number of
seed predictors is restricted to eight. If one
seed predictor is selected for combined
prediction, for example “Mode 1,” then the
index of the seed predictor “1” will be sig-
naled in the bit stream.
For spatial derivation, the prediction
block may be obtained from one or two
prediction blocks specified by the motion
copied from its spatial neighboring
blocks. The neighboring blocks are illus-
trated in Figure 5. They are searched in a
predefined order F, G, C, A, B, D, and the
selected neighboring block is signaled in
the bit stream.
MOTION VECTOR PREDICTION
AND CODING
MVP plays an important role in interpre-
diction, which can reduce the redundancy
among MVs of neighboring blocks and
thus save large numbers of coding bits for
MVs. In AVS2, four different prediction
methods are adopted, as tabulated in
Table 2. Each of them has its unique
usage. Spatial MVP is used for the spatial
derivation of Skip/Direct mode in F frames
and B frames. Temporal MVP is used for
temporal derivation of Skip/Direct mode
in P frames and F frames. Spatial-tempo-
ral-combined MVP is used for the joint
temporal and spatial derivation of Skip/
Direct mode in B frames. For other cases,
median prediction is used.
In AVS2, the MV is in quarter-pixel
precision for the luminance component,
and the subpixel is interpolated with an
eight-tap DCT interpolation filter (DCT-
IF) [11]. For the chrominance compo-
nent, the MV derived from luminance
with 1/8 pixel precision and a four-tap
DCT-IF is used for subpixel interpolation
[12]. After the MVP, the MV difference
(MVD) is coded in the bit stream. How-
ever, redundancy may still exist in MVD,
and to further save coding bits of MVs, a
progressive MV resolution adaptation
method is adopted in AVS2 [13]. In this
scheme, the MVP is firstly rounded to the
nearest integer sample position, and then
the MV is rounded to a half-pixel preci-
sion if its distance from MVP is larger
than a by a threshold. Finally, the resolu-
tion of the MVD is decreased to half-pixel
precision if it is larger than a threshold.
TRANSFORM
Two-level transform coding is utilized to
further compress the predicted residual.
For a CU with symmetric prediction unit
partition, the TU size can be
NN22# or
NN# signaled by a transform split flag.
Thus, the maximum transform size is
64
# 64, and the minimum transform
size is 4 # 4. For the TU size 4 # 4 to 32
# 32, an integer transform (IT) that
closely approximates the performance of
the discrete cosine transform (DCT) is
used; while for the 64 # 64 transform, a
logical transform (LOT) [14] is applied to
the residual. A five-three-tap integer wave-
let transform is first performed on a 64 #
64 block discarding the low-high (LH),
high-low (HL), and (high-high) HH-
bands, and then a normal 32 # 32 IT is
applied to the low-low (LL)-band. For a CU
that has an asymmetric PU partition, a
NN22# IT is used in the first level and a
nonsquare transform [15] is used in the sec-
ond level, as shown in Figure 6. Moreover,
in the latest AVS2 standard, a secondary
transform was adopted for intraprediction
residual (for more details see the latest AVS
specification document N2120 on the AVS
FTP Web site [21]).
ENTROPY CODING
After transform and quantization, a two-
level coding scheme is applied to the
A
F
Current PU
DB GC
[FIG5]
An illustration of neighboring
blocks A, B, C, D, F, and G for MVP.
[TABLE 2] MV PREDICTION METHODS IN AVS2.
METHOD DETAILS
MEDIAN USING THE MEDIAN MV VALUES OF THE NEIGHBORING BLOCKS.
SPATIAL USING THE MVs OF SPATIAL NEIGHBORING BLOCKS.
TEMPORAL USING THE MVs OF TEMPORAL COLLOCATED BLOCKS.
SPATIAL-TEMPORAL COMBINED USING THE TEMPORAL MVP FIRST IF IT IS AVAILABLE, AND SPATIAL
MVP IS USED INSTEAD IF THE TEMPORAL MVP IS NOT AVAILABLE.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®