This article is part of the series Joint Audio-Visual Speech Processing.

Open Access Research Article

Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Ara V Nefian1*, Luhong Liang2, Xiaobo Pi2, Xiaoxing Liu2 and Kevin Murphy3

Author Affiliations

1 Intel Corporation, Microprocessor Research Labs, 2200 Mission College Blvd., Santa Clara, CA 95052-8119, USA

2 Intel Corporation, Microcomputer Research Labs, Guanghua Road, 100020 Chaoyang District, Beijing, China

3 Computer Science Division, University of California, Berkeley, Berkeley, CA 94720-1776, USA

For all author emails, please log on.

EURASIP Journal on Advances in Signal Processing 2002, 2002:783042  doi:10.1155/S1110865702206083


The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2002/11/783042


Received: 30 November 2001
Revisions received: 6 August 2002
Published: 28 November 2002

© 2002 Nefian et al.

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

Keywords:
audio-visual speech recognition; hidden Markov models; coupled hidden Markov models; factorial hidden Markov models; dynamic Bayesian networks

Research Article