skip to primary navigationskip to content

Department of Computer Science and Technology



Course pages 2023–24

Information Theory

Principal lecturer: Dr Robert Harle
Taken by: Part II CST
Term: Michaelmas
Hours: 12
Format: In-person lectures
Suggested hours of supervisions: 3
Exam: Paper 8 Question 7; Paper 9 Question 7
Past exam questions, Moodle, timetable


This course introduces the principles and applications of information theory: how information is measured in terms of probability and various entropies, how these are used to calculate the capacity of communication channels, with or without noise, and to measure how much random variables reveal about each other. Coding schemes including error correcting codes are studied along with data compression, spectral analysis, transforms, and wavelet coding. Applications of information theory are reviewed, from astrophysics to pattern recognition.


  • Information, probability, uncertainty, and surprise. How concepts of randomness and uncertainty are related to information. How the metrics of information are grounded in the rules of probability. Shannon Information. Weighing problems and other examples.
  • Entropy of discrete variables. Definition and link to Shannon information. Joint entropy, Mutual information. Visual depictions of the relationships between entropy types. Why entropy gives fundamental measures of information content.
  • Source coding theorem and data compression; prefix, variable-, and fixed-length codes. Information rates; Asymptotic equipartition principle; Symbol codes; Huffman codes and the prefix property. Binary symmetric channels. Capacity of a noiseless discrete channel. Stream codes.
  • Noisy discrete channel coding. Joint distributions; mutual information; Conditional Entropy; Error-correcting codes; Capacity of a discrete channel. Noisy channel coding theorem.
  • Entropy of continuous variables. Differential entropy; Mutual information; Channel Capacity; Gaussian channels.
  • Entropy for comparing probability distributions and for machine learning. Relative entropy/KL divergence; cross-entropy; use as loss function.
  • Applications of information theory in other sciences. 


At the end of the course students should be able to

  • calculate the information content of a random variable from its probability distribution;
  • relate the joint, conditional, and marginal entropies of variables in terms of their coupled probabilities;
  • define channel capacities and properties using Shannon’s Theorems;
  • construct efficient codes for data on imperfect communication channels;
  • generalize the discrete concepts to continuous signals on continuous channels;
  • understand encoding and communication schemes in terms of the spectral properties of signals and channels;
  • describe compression schemes, and efficient coding using wavelets and other representations for data.

Recommended reading

  Mackay's Information Theory, inference and Learning Algorithms

 Stone's Information Theory: A Tutorial Introduction.