This article is part of the series Anthropomorphic Processing of Audio and Speech.

Open Access Research Article

A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration

Steven van de Par1*, Armin Kohlrausch12, Richard Heusdens3, Jesper Jensen3 and Søren Holdt Jensen4

Author Affiliations

1 Digital Signal Processing Group, Philips Research Laboratories, Eindhoven 5656 AA, The Netherlands

2 Department of Technology Management, Eindhoven University of Technology, Eindhoven 5600 MB, The Netherlands

3 Department of Mediamatics, Delft University of Technology, Delft 2600 GA, The Netherlands

4 Department of Communication Technology, Institute of Electronic Systems, Aalborg University, Aalborg DK-9220, Denmark

For all author emails, please log on.

EURASIP Journal on Advances in Signal Processing 2005, 2005:317529  doi:10.1155/ASP.2005.1292


The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2005/9/317529


Received: 31 October 2003
Revisions received: 22 July 2004
Published: 21 June 2005

© 2005 van de Par et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a psychoacoustical model that is specifically suited for sinusoidal modelling of audio signals. In this paper, we present a new perceptual model that predicts masked thresholds for sinusoidal distortions. The model relies on signal detection theory and incorporates more recent insights about spectral and temporal integration in auditory masking. As a consequence, the model is able to predict the distortion detectability. In fact, the distortion detectability defines a (perceptually relevant) norm on the underlying signal space which is beneficial for optimisation algorithms such as rate-distortion optimisation or linear predictive coding. We evaluate the merits of the model by combining it with a sinusoidal extraction method and compare the results with those obtained with the ISO MPEG-1 Layer I-II recommended model. Listening tests show a clear preference for the new model. More specifically, the model presented here leads to a reduction of more than 20% in terms of number of sinusoids needed to represent signals at a given quality level.

Keywords:
audio coding; psychoacoustical modelling; auditory masking; spectral masking; sinusoidal modelling; psychoacoustical matching pursuit

Research Article