Computer Laboratory

Course pages 2012–13

Spoken Language Processing

Principal lecturers: Prof Phil Woodland, Dr Bill Byrne
Taken by: MPhil ACS, Part III
Code: L106
Hours: 16 (14 lectures + 2 practical sessions)
Prerequisites:

Aims

The aim of this module is to introduce the underlying statistical approaches and some of the major techniques used for spoken language processing. Core statistical models that are used in a wide-range of speech and language applications will be discussed along with their underlying theory. Examples of how these models may be applied to speech processing applications, such as speech recognition and speaker verification, will be described.

Syllabus

  • Introduction: Scope of the course and introduction to speech sounds and frequency domain representations.
  • Basic pattern processing: Bayes' decision rule, forms of statistical classifier and generative models.
  • GMM-based speaker verification: basic feature vectors (MFCC/deltas), MAP parameter estimation, decision rules/ROC curve.
  • Hidden Markov Models and the Viterbi algorithm: HMM structure and underlying assumptions, training using Baum-Welch and the EM algorithm. Networks and the Viterbi algorithm.
  • Decision trees and context modelling: phone-level variation, dictionaries, decision trees and context clustering.
  • Weighted finite state transducers: basic operations of WFST, and WFST representation of information in speech systems.
  • N-gram language models: N-gram language models, discounting, smoothing, backing-off, mixture language models and interpolation. WFST representation of language models.
  • Applications of spoken language processing: examples of applications, including speech recognition and speech synthesis.

Objectives

On completion of this module, students should:

  • understand the basic principles of pattern classification;
  • understand Gaussian mixture models, hidden Markov models and N-gram language models;
  • understand and be able to implement the Viterbi algorithm;
  • understand weight finite state transducers;
  • be able to apply the above approaches to speech processing applications.

Practical work

  • Practical 1: Speech Recognition System: implementation of Viterbi algorithm and design of a basic continuous speech recognition system.
  • Practical 2: Language Modelling: implement language model interpolation and language model performance evaluation.

Assessment

The module will be assessed by two practical reports. Each report will contribute 50% of the marks.

Recommended reading

Jurafsky, D. & Martin, J. (2008). Speech and language processing. Prentice-Hall.
Huang, X., Acero, A. & Hon, H-W. (2001). Spoken language processing. Prentice-Hall.
Duda, R., Hart, P. & Stork, D. (2000). Pattern classification. John Wiley (2nd ed.). ISBN 0471056693