Open Access Research Article

Data-Model Relationship in Text-Independent Speaker Recognition

John SD Mason1*, Nicholas WD Evans1, Robert Stapert2 and Roland Auckenthaler1

Author Affiliations

1 School of Engineering, University of Wales Swansea, Swansea SA2 8 PP, UK

2 Aculab, Milton Keynes MK1 1PT, UK

For all author emails, please log on.

EURASIP Journal on Advances in Signal Processing 2005, 2005:582548  doi:10.1155/ASP.2005.471


The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2005/4/582548


Received: 12 December 2002
Revisions received: 23 September 2004
Published: 30 March 2005

© 2005 Mason et al.

Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs) do not include time sequence information (TSI) within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM), embeds dynamic time warping (DTW) into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.

Keywords:
speaker recognition; segmental mixture modelling

Research Article