skip to primary navigationskip to content

Department of Computer Science and Technology


Course pages 2020–21

Information Theory

Principal lecturer: Prof John Daugman
Taken by: Part II CST 50%, Part II CST 75%
Hours: 16 (Three examples classes will be given)
Past exam questions


This course introduces the principles and applications of information theory: how information is measured in terms of probability and various entropies, how these are used to calculate the capacity of communication channels, with or without noise, and to measure how much random variables reveal about each other. Coding schemes including error correcting codes are studied along with data compression, spectral analysis, transforms, and wavelet coding. Applications of information theory are reviewed, from astrophysics to pattern recognition.


  • Foundations: probability, uncertainty, information. How concepts of randomness, redundancy, compressibility, noise, bandwidth, and uncertainty are related to information. Ensembles, random variables, marginal and conditional probabilities. How the metrics of information are grounded in the rules of probability.
  • Entropies defined, and why they are measures of information. Marginal joint and conditional entropy; chain rule for entropy. Cross-entropy and distances between distributions. Mutual information between random variables. Why entropy gives fundamental measures of information content.
  • Source coding theorem; prefix, variable-, and fixed-length codes. Markov sources. Entropy of a multi-state Markov process. Symbol codes; Huffman codes and the prefix property. Binary symmetric channels. Capacity of a noiseless discrete channel.
  • Noisy discrete channel properties, and channel capacity. Perfect communication through a noisy channel: error-correcting codes. Capacity of a discrete channel as the maximum of its mutual information.
  • Information represented by projections and in transforms. Expressing data in vector spaces or as a linear combination of basis functions. Inner product spaces and orthonormal systems. Norms, span, and linear subspaces; dimensionality reduction.
  • Fourier analysis: series and transforms, discrete or continuous. How periodic and aperiodic data are analysed and represented by Fourier methods. Rates of convergence. Information revealed in the Fourier domain. Discrete, inverse, and Fast Fourier Transforms; butterfly algorithm. Duality properties. Wavelet transforms.
  • Spectral properties of continuous-time signals and channels. Signals represented as combinations of complex exponential eigenfunctions; channels represented as spectral filters that add noise. Convolution. Applying Fourier analysis to communication schemes.
  • Continuous information; density; noisy channel coding theorem. Extensions of discrete entropies and measures to the continuous case. Signal-to-noise ratio; power spectral density. Gaussian channels. Relative significance of bandwidth and noise limitations. The Shannon rate limit for noisy continuous channels.
  • Signal coding and transmission schemes using Fourier theorems. Nyquist Sampling Theorem. Aliasing and its prevention. Modulation and shift theorems; multiple carriers; frequency and phase modulation codes; ensembles. Filters, coherence, demodulation; noise removal by correlation.
  • The quantized degrees-of-freedom in a continuous signal. Why a continuous signal of finite bandwidth and duration has a fixed number of degrees-of-freedom. Diverse illustrations of the principle that information, even in such a signal, comes in quantized, countable, packets.
  • Gabor-Heisenberg-Weyl uncertainty relation. Optimal “Logons”. Unification of the time-domain and the frequency-domain as endpoints of a continuous deformation. The Uncertainty Principle and its optimal solution by Gabor’s expansion basis of “logons”. Multi-resolution wavelet codes. Extension to images, for analysis and compression.
  • Data compression codes and protocols. Run-length coding; dictionary methods on strings; vector quantisation; JPEG and JP2K image compression; orthogonal subspace projections; predictive coding; and wavelet scalar quantisation.
  • Kolmogorov complexity. Minimal description length. Definition of the algorithmic complexity of a data sequence, and its relation to the entropy of the distribution from which the data was drawn. Fractals. Minimal description length, and why this measure of complexity is not computable.
  • Applications of information theory in other sciences. Use of information metrics and analysis in: genomics; neuroscience; astrophysics; noisy signal classification; and pattern recognition including biometrics.


At the end of the course students should be able to

  • calculate the information content of a random variable from its probability distribution;
  • relate the joint, conditional, and marginal entropies of variables in terms of their coupled probabilities;
  • define channel capacities and properties using Shannon’s Theorems;
  • construct efficient codes for data on imperfect communication channels;
  • generalize the discrete concepts to continuous signals on continuous channels;
  • understand encoding and communication schemes in terms of the spectral properties of signals and channels;
  • describe compression schemes, and efficient coding using wavelets and other representations for data.

Recommended reading

* Cover, T.M. and Thomas, J.A. (2006). Elements of information theory. New York: Wiley.