Computer Laboratory

Course pages 2013–14

Numerical Methods

Principal lecturer: Dr David Greaves
Taken by: Part IA CST, Part IA NST, Part I PBS
Past exam questions
Information for supervisors (contact lecturer for access permission)

No. of lectures: 11
Suggested hours of supervisions: 4
This course is useful for the Part II courses Advanced Graphics and Digital Signal Processing.


This course provides:

  1. an introduction to (IEEE) floating-point data representation and arithmetic;
  2. illustrations of how naïve implementations of obvious mathematics can go badly wrong;
  3. a study of several standard numerical algorithms.

An overall implicit aim is to encourage caution when using any floating-point value produced by a computer program.


  • Integer and floating-point representation and arithmetic. Signed and unsigned integers and fixed-point; arithmetic, saturating arithmetic. IEEE 754/854 floating point (32 and 64 bit); zeros, infinities, NaN. What numbers are exactly representable in bases 2 and 10. Accuracy in terms of significant figures. Floating-point arithmetic is non-associative, and mathematical equivalences fail. Nonsensical results, e.g. sin(1e40).

  • IEEE floating-point arithmetic. Floating-point arithmetic, and the IEEE requirements. Why the IEEE standard has endured. Overflow, underflow, progressive loss of significance. Rounding modes. Difficulty in obtaining IEEE-quality in libraries. The java.lang.Math trigonometric library promises.

  • How floating-point computations diverge from real-number calculations. Absolute Error, Relative Error, Machine epsilon, Unit in Last Place (ulp). Finite computation: solving a quadratic. Summing a finite series. Rounding (round-off) and truncation (discretisation) error. Numerical differentiation; determining a good step size.

  • Iteration and when to stop. Unbounded computation may produce unbounded errors. Solving equations by iteration and comparison to terminate it. Newton’s method. Order of convergence. Why summing a Taylor series is problematic.

  • Custom Encodings Arbitrary precision floating point, adaptive floating point, interval arithmetic. Logarithmic and other non-linear representations. Their use in a-posteriori decision algorithms. Eg for rapid multiplication in Viterbi/Bayes and specialist ALUs (e.g. for low-density parity).

  • Matrix Methods Gaussian Elimination. L/U and Cholesky decompositions. Doolittle, Crout methods. Their stability and usefulness.

  • Efficient Implementation Chebychev orthogonal basis (for power series economisation) Practical implementation of scientific (trig/log) functions. Comparison of Taylor, Chebychev and Cordic.

  • Ill Conditioning and Chaotic Simulations. Role of partial derivatives and backwards stability. Condition number when amenable to mathematical analysis; Monte-Carlo exploration when not. Finite-difference simulation and chaos in weather forecasting and Mandelbrot.

  • Adaptive methods Linear approximations. Wave propagation simulation. Non-linear spatial quantisation. Dynamic temporal quantisation and its use in SPICE.

  • Miscellaneous Topics Discussion on the problems of exact real arithmetic. Remark on the x86 implementations of IEEE arithmetic, interaction of coding style and compiler “optimisations”. Final Remarks.


At the end of the course students should

  • be able to convert simple decimal numbers to and from IEEE floating-point format, and to perform IEEE arithmetic on them;

  • be able to identify problems with floating-point implementations of simple mathematical problems and know when incorrect solution is likely;

  • be familiar with several key algorithms from the history of numerical analysis;

  • decide how and when computation energy should be traded for accuracy;

  • know to use a professionally-written package whenever possible (and still to treat claims of accuracy with suspicion).

Recommended reading

Overton, M.L. (2001). Numerical computing with IEEE floating point arithmetic. SIAM.

Further reading - goes far beyond the course

Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, vol. 23, pp. 5-48.