Next: Natural Language Processing
Up: Lent Term 2003: Part
Previous: Information Retrieval
  Contents
Information Theory and Coding
Lecturers: Dr N.A. Dodgson and Dr M.G. Kuhn
(nad@cl.cam.ac.uk, mgk25@cl.cam.ac.uk)
No. of lectures: 16 (13 NAD + 3 MGK)
Prerequisite courses: Continuous Mathematics, Probability, Discrete Mathematics
This course is a prerequisite for Security (Part II).
This course is substantially different from that given in previous years.
Aims
Information theory addresses two key questions: how much can we
compress data and how much data can be transmitted down a noisy
communication channel. This course provides the mathematical theory
needed to answer these questions and practical algorithms which allow
us to approach the maximum data compression rate and the maximum
channel capacity.
Students will gain an understanding of the fundamental concepts of
information theory; be able to produce mathematical proofs of the
principal conclusions of information theory; be able to describe
practical methods for attaining good data compression rates and good
transmission rates; understand the approximations inherent in
representing continuous data by discrete samples; and understand
mechanisms for removing irrelevant information from data destined for
human perception.
Lectures
Most of the course is based on chapters 2,3,4,5 and 8 of Cover &
Thomas [CT], which includes a range of exercises for
students to attempt. Students are expected to have access to a copy of
this book. Most College libraries hold it.
- Fundamental concepts.
Entropy, relative entropy, mutual information -- the basic
definitions required for information theory, with simple examples
[CT §2.1-2.8, 2.11]. The asymptotic equipartition property
[CT §3.1-3.3]. Entropy rates of stochastic processes
including Markov chains; the entropy of English [CT §4.1, 4.2,
4.4, 6.4]. [NAD]
- Data compression.
Proof that the entropy of an information source is an absolute minimum
on the information required to represent signals from that source.
Design of real codes: Huffman coding; Shannon-Fano-Elias coding;
arithmetic coding [CT §5.1-5.10]. [NAD]
- Channel capacity.
Calculation of the maximum data rate which can be achieved with
negligible error, down a noisy channel. Hamming codes: an example of
error-correcting codes [CT §8.1-9, 8.11 (plus an outline of the
main results from §8.12-8.13)]. [NAD]
- Sampling and quantisation.
Representing continuous data as discrete data. Sampling error and
quantisation error. [CT do not cover this in depth: §10.3 and
§13.1 give only a little detail.] [NAD]
- Image, video and audio compression. A practical guide to
compressing data intended for human perception (including JPEG and
MPEG). [MGK, 3 lectures]
- Kolmogorov complexity.
If there is time, the course will consider this alternative information theory [CT Chapter 7]. [NAD]
Objectives
At the end of the course students should be able to
- understand and use the mathematical concepts of entropy, joint
entropy, conditional entropy, relative entropy, and mutual
information, for both independent and stochastic processes
- calculate these quantities for a simple alphabet, given the
probability distribution
- understand the asymptotic
equipartition property and its implications
- understand and
prove the key results relating to data compression and channel
capacity
- understand, be able to explain, and be able to
construct Huffman, Shannon-Fano-Elias, arithmetic, and Hamming codes
- understand and explain the source of the errors which arise in
the sampling and quantisation of continuous data
- understand the issues surrounding the compression of data
intended for human perception
Course text book
Cover, T.M. & Thomas, J.A. (1991). Elements of Information
Theory. New York: Wiley. Price: £66 (August 2002).
Next: Natural Language Processing
Up: Lent Term 2003: Part
Previous: Information Retrieval
  Contents
Christine Northeast
Wed Sep 4 14:43:05 BST 2002