Computer Laboratory

Course pages 2015–16


Information Theory

Lecturer: Professor J.G. Daugman

No. of lectures: 12

Suggested hours of supervisions: 3

Prerequisite courses: Mathematical Methods for Computer Science


The aims of this course are to introduce the principles and applications of information theory. The course covers how information is measured in terms of probability and various entropies, and how these are used to calculate the capacity of communication channels, continuous or discrete, with or without noise. Coding schemes including error correcting codes are studied along with data compression, spectral analysis, and efficient coding using wavelets. Applications of information theory are also reviewed, from bioinformatics to pattern recognition.


  • Foundations: probability, uncertainty, information. How concepts of randomness, redundancy, compressibility, noise, bandwidth, and uncertainty are related to information. Ensembles, random variables, marginal and conditional probabilities. How the metrics of information are grounded in the rules of probability.

  • Entropies defined, and why they are measures of information. Marginal entropy, joint entropy, conditional entropy, and the Chain Rule for entropy. Mutual information between ensembles of random variables. Why entropy is the fundamental measure of information content.

  • Source coding theorem; prefix, variable-, and fixed-length codes. Markov sources. Entropy rate of a Markov process. Symbol codes. Huffman codes and the prefix property. Binary symmetric channels. Capacity of a noiseless discrete channel.

  • Discrete channel properties, noise, and channel capacity. Perfect communication through a noisy channel: error-correcting codes. Capacity of a discrete channel as the maximum of its mutual information over all possible input distributions.

  • Spectral properties of continuous-time signals and channels. Signals represented as combinations of complex exponential eigenfunctions; channels represented as spectral filters that add noise. Applying Fourier analysis to signal communication. Continuous versus discrete, and periodic versus aperiodic signals and their transforms. Duality properties.

  • Continuous information; density; noisy channel coding theorem. Extensions of discrete entropies and measures to the continuous case. Signal-to-noise ratio; power spectral density. Gaussian channels. Relative significance of bandwidth and noise limitations. The Shannon rate limit for noisy continuous channels.

  • Signal coding and transmission schemes using Fourier theorems. Nyquist Sampling Theorem. Aliasing and its prevention. Modulation and shift theorems; multiple carriers; frequency and phase modulation codes; ensembles. Filters, coherence, demodulation; noise removal by correlation.

  • The quantized degrees-of-freedom in a continuous signal. Why a continuous signal of finite bandwidth and duration has a fixed number of degrees-of-freedom. Diverse illustrations of the principle that information, even in such a signal, comes in quantized, countable, packets.

  • Gabor-Heisenberg-Weyl uncertainty relation. Optimal “Logons”. Unification of the time-domain and the frequency-domain as endpoints of a continuous deformation. The Uncertainty Principle and its optimal solution by Gabor’s expansion basis of “logons”. Multi-resolution wavelet codes. Extension to images, for analysis and compression.

  • Data compression codes and protocols. Run-length coding; dictionary methods on strings; vector quantisation; JPEG and JP2K image compression; orthogonal subspace projections; predictive coding; the Laplacian pyramid; and wavelet scalar quantisation.

  • Kolmogorov complexity. Minimal description length. Definition of the algorithmic complexity of a data sequence, and its relation to the entropy of the distribution from which the data was drawn. Fractals. Minimal description length, and why this measure of complexity is not computable.

  • Applications of information theory in other sciences. Use of information metrics and analysis in: genomics; neuroscience; astrophysics; noisy signal classification; and pattern recognition including biometrics.


At the end of the course students should be able to

  • calculate the information content of a random variable from its probability distribution;

  • relate the joint, conditional, and marginal entropies of variables in terms of their coupled probabilities;

  • define channel capacities and properties using Shannon’s Theorems;

  • construct efficient codes for data on imperfect communication channels;

  • generalize the discrete concepts to continuous signals on continuous channels;

  • understand encoding and communication schemes in terms of the spectral properties of signals and channels;

  • describe compression schemes, and efficient coding using wavelets and other representations for data.

Recommended reading

* Cover, T.M. & Thomas, J.A. (2006). Elements of information theory. New York: Wiley.