Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [32] MARCH 2015
sounds can be extracted from a specific direction or from a specific
arbitrary two-dimensional or even three-dimensional region of inter-
est. Furthermore, the sound scene can be manipulated to create an
acoustic zoom effect in which direct sounds within the listening
angular range are amplified depending on the zoom factor, while
other sounds are suppressed. In addition, the signals and parameters
can be used to create surround sound signals. As the manipulation
and synthesis are highly application dependent, we focus in this arti-
cle on three illustrative assisted listening applications: spatial audio
communication, virtual classroom, and binaural hearing aids.
INTRODUCTION
Communication and assisted listening devices commonly use
multiple microphones to create one or more signals, the content
of which highly depends on the application. For example, when
smart glasses are used to record a video, the microphones can be
used to create a surround sound recording that consists of mul-
tiple audio signals. A compact yet accurate representation of the
sound field at the recording position makes it possible to render
the sound field on an arbitrary reproduction setup in a different
location. On the other hand, when the device is used in hands-free
or speech recognition mode, the microphones can be used to
extract the user’s speech while reducing background noise and
interfering sounds. In the last few decades, sophisticated solutions
for these applications were developed.
Spatial recordings are commonly made using specific micro-
phone setups. For instance, there are several stereo recording
techniques in which different positioning of the microphones of
the same or different types (e.g., cardioid or omnidirectional
microphones) is exploited to make a stereo recording that can be
reproduced using loudspeakers. When more loudspeakers are
available for spatial sound rendering, the microphone recordings
are often specifically mixed for a given reproduction setup. These
classical techniques do not provide the flexibility required in many
modern applications where the reproduction setup is not known
in advance. Signal enhancement, on the other hand, is commonly
achieved by filtering, and subsequently summing the available
microphone signals. Classical spatial filters often require informa-
tion on the second-order statistics (SOS) of the desired and unde-
sired signals (cf. [4] and [5]). For real-time applications, the SOS
need to be estimated online, and the quality of the output signal
highly depends on the accuracy of these estimates. To date, major
challenges remain, such as:
1) achieving a sufficiently fast response to changes in the
sound scene (such as moving and emerging sources) and to
changes in the acoustic conditions
2) providing sufficient flexibility in terms of spatial selectivity
3) ensuring a high-quality output signal at all times
4) providing solutions with a manageable computational
complexity.
Although the use of multiple microphones provides, at least in
theory, a major advantage over a single microphone, the adoption
of multimicrophone techniques in practical systems has not been
particularly popular until very recently. Possible reasons for this
could be that in real-life scenarios, these techniques provided
insufficient improvement over single-microphone techniques,
while significantly increasing the computational complexity, the
system calibration effort, and the manufacturing costs. In the last
few years, the smartphone and hearing aid industries made a sig-
nificant step forward in using multiple microphones, which has
recently become a standard for these devices.
Parametric spatial sound processing provides a unified solution
to both the spatial recording and signal enhancement problems,
as well as to other challenging sound processing tasks such as add-
ing virtual sound sources to the sound scene. As illustrated in
Figure 1, the parametric processing is performed in two successive
steps that can be completed on the same device or on different
devices. In the first step, the sound field is analyzed in narrow fre-
quency bands using multiple microphones to obtain a compact
and perceptually meaningful description of the sound field in
terms of direct and diffuse sound components and some paramet-
ric information (e.g., DOAs and positions). In the second step, the
input signals and possibly the parameters are modified, and one or
more output signals are synthesized. The modification and synthe-
sis can be user, application, or scenario dependent. Parametric
spatial sound processing is also common in audio coding (cf. [6])
where parametric information is extracted directly from the loud-
speaker channels instead of the microphone signals.
The described scheme also allows for an efficient transmission
of sound scenes to the far-end side [1], [7] for loudspeaker
[FIG1] A high-level overview of the parametric spatial sound processing scheme.
Microphone
Signals
Spatial
Analysis
Direct
Diffuse
Parameters
Storage
Transmission
(Optional)
Direct
Diffuse
Parameters
Processing
and
Synthesis
Output
Signal(s)
User Settings
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®