Course pages 2012–13
Spoken Language Processing
The aim of this module is to introduce the underlying statistical approaches and some of the major techniques used for spoken language processing. Core statistical models that are used in a wide-range of speech and language applications will be discussed along with their underlying theory. Examples of how these models may be applied to speech processing applications, such as speech recognition and speaker verification, will be described.
- Introduction: Scope of the course and introduction to speech sounds and frequency domain representations.
- Basic pattern processing: Bayes' decision rule, forms of statistical classifier and generative models.
- GMM-based speaker verification: basic feature vectors (MFCC/deltas), MAP parameter estimation, decision rules/ROC curve.
- Hidden Markov Models and the Viterbi algorithm: HMM structure and underlying assumptions, training using Baum-Welch and the EM algorithm. Networks and the Viterbi algorithm.
- Decision trees and context modelling: phone-level variation, dictionaries, decision trees and context clustering.
- Weighted finite state transducers: basic operations of WFST, and WFST representation of information in speech systems.
- N-gram language models: N-gram language models, discounting, smoothing, backing-off, mixture language models and interpolation. WFST representation of language models.
- Applications of spoken language processing: examples of applications, including speech recognition and speech synthesis.
On completion of this module, students should:
- understand the basic principles of pattern classification;
- understand Gaussian mixture models, hidden Markov models and N-gram language models;
- understand and be able to implement the Viterbi algorithm;
- understand weight finite state transducers;
- be able to apply the above approaches to speech processing applications.
- Practical 1: Speech Recognition System: implementation of Viterbi algorithm and design of a basic continuous speech recognition system.
- Practical 2: Language Modelling: implement language model interpolation and language model performance evaluation.
The module will be assessed by two practical reports. Each report will contribute 50% of the marks.
Jurafsky, D. & Martin, J. (2008). Speech and language
Huang, X., Acero, A. & Hon, H-W. (2001). Spoken language processing. Prentice-Hall.
Duda, R., Hart, P. & Stork, D. (2000). Pattern classification. John Wiley (2nd ed.). ISBN 0471056693