Computer Laboratory

Discontinued courses

MPhil in Computer Speech, Text and Internet Technology

This course was run for the last time in 2009–2010. Much of the material taught was incorporated into modules in the MPhil in Advanced Computer Science.

CSTIT lab

The MPhil in Computer Speech, Text and Internet Technology (CSTIT) was a one-year Masters course on the state-of-the-art in Speech and Language Processing and its application to Internet Technology. The main aim was to teach the fundamental theory of speech and natural language processing and its use in a variety of advanced applications, especially those related to the Internet.

  • Speech Processing: analysis, speech recognition, speech synthesis
  • Language Processing (computational linguistics): syntax, parsing, semantics, discourse
  • Applications: information retrieval, information extraction, dialogue systems, machine translation, question answering.

The CSTIT was a one-year postgraduate course, which combined lectures, practicals, seminars and a substantial research project. It started off with a term of taught material (lectures and structured practicals) covering the foundations of speech and language processing. In the second term, students attended lectures on more advanced topics, participated in a small group seminar in which they studied and presented material on a research topic, undertook two longer practicals and started on their research project.

The CSTIT provides a foundation for PhD level research or commercial speech and language technology development. Topics addressed in student projects included: multi-document summarization, sentiment and topic classification, named entity recognition, ontology extraction, support vector machines, statistical language modelling, meeting transcription, multimodal fusion, Voice XML, voice conversion, prosodic boundary prediction and gesture-based interfaces.

The course was taught jointly by the Computer Laboratory (Natural Language and Information Processing group) and the Department of Engineering (Speech Research Group) in the University of Cambridge.

CSTIT Modules

Module 1A

  • Pattern processing, speech signal processing, introduction to automatic speech recognition (ASR), continuous speech recognition, acoustic modelling, language models for ASR, large vocabulary decoding, text-to-speech.
  • Practical sessions on speech signal processing and phone HMM training/search
  • Assessment: Written examination and satisfactory practical completion.

Module 1B

  • Introduction to linguistics, syntactic and semantic analysis, finite-state techniques, parsing, constraint-based grammar, compositional semantics, semantic underspecification, information retrieval, information extraction.
  • Practical sessions on vector space models, parsing, syntax and semantics.
  • Assessment: Written examination and satisfactory practical completion.

Module 2A

  • Advanced topics in speech recognition (adaptation/robustness, discriminative training, confusion networks), Meta-Data Extraction, Statistical Machine Translation, Spoken Dialogue Systems.
  • Extended practical: on speech recognition and statistical machine translation.
  • Assessment: Report on practical.

Module 2B

  • Lexical semantics, word sense disambiguation, language generation, summarisation, discourse, anaphora, statistical parsing, grammar induction.
  • Extended practical: building a question answering system.
  • Assessment: Report on practical.

Module 2S or 2L: Reading Club

Seminars and student presentations on research topics in speech (2S) or language processing (2L).
Assessment: Presentation and participation assessed on satisfactory/unsatisfactory basis.

Project

All students undertook a project and wrote a dissertation in the general area covered by the course. Projects could be research-oriented or application-oriented. Industrial collaboration on projects was encouraged.

Timescale: Students started their project during the second term and worked on it full-time from the end of that term until the submission of the dissertation in June.

Assessment: Project dissertation of not more than 15,000 words.