# Computer Laboratory

Course pages 2013–14

# Numerical Methods

Principal lecturer: Dr David Greaves
Taken by: Part IA CST, Part IA NST, Part I PBS
Past exam questions
Information for supervisors (contact lecturer for access permission)

No. of lectures: 11
Suggested hours of supervisions: 4
This course is useful for the Part II courses Advanced Graphics and Digital Signal Processing.

## Aims

This course provides:

1. an introduction to (IEEE) floating-point data representation and arithmetic;
2. illustrations of how naïve implementations of obvious mathematics can go badly wrong;
3. a study of several standard numerical algorithms.

An overall implicit aim is to encourage caution when using any floating-point value produced by a computer program.

## Lectures

• Integer and floating-point representation and arithmetic. Signed and unsigned integers and fixed-point; arithmetic, saturating arithmetic. IEEE 754/854 floating point (32 and 64 bit); zeros, infinities, NaN. What numbers are exactly representable in bases 2 and 10. Accuracy in terms of significant figures. Floating-point arithmetic is non-associative, and mathematical equivalences fail. Nonsensical results, e.g. sin(1e40).

• IEEE floating-point arithmetic. Floating-point arithmetic, and the IEEE requirements. Why the IEEE standard has endured. Overflow, underflow, progressive loss of significance. Rounding modes. Difficulty in obtaining IEEE-quality in libraries. The java.lang.Math trigonometric library promises.

• How floating-point computations diverge from real-number calculations. Absolute Error, Relative Error, Machine epsilon, Unit in Last Place (ulp). Finite computation: solving a quadratic. Summing a finite series. Rounding (round-off) and truncation (discretisation) error. Numerical differentiation; determining a good step size.

• Iteration and when to stop. Unbounded computation may produce unbounded errors. Solving equations by iteration and comparison to terminate it. Newton’s method. Order of convergence. Why summing a Taylor series is problematic.

• Custom Encodings Arbitrary precision floating point, adaptive floating point, interval arithmetic. Logarithmic and other non-linear representations. Their use in a-posteriori decision algorithms. Eg for rapid multiplication in Viterbi/Bayes and specialist ALUs (e.g. for low-density parity).

• Matrix Methods Gaussian Elimination. L/U and Cholesky decompositions. Doolittle, Crout methods. Their stability and usefulness.

• Efficient Implementation Chebychev orthogonal basis (for power series economisation) Practical implementation of scientific (trig/log) functions. Comparison of Taylor, Chebychev and Cordic.

• Ill Conditioning and Chaotic Simulations. Role of partial derivatives and backwards stability. Condition number when amenable to mathematical analysis; Monte-Carlo exploration when not. Finite-difference simulation and chaos in weather forecasting and Mandelbrot.

• Adaptive methods Linear approximations. Wave propagation simulation. Non-linear spatial quantisation. Dynamic temporal quantisation and its use in SPICE.

• Miscellaneous Topics Discussion on the problems of exact real arithmetic. Remark on the x86 implementations of IEEE arithmetic, interaction of coding style and compiler “optimisations”. Final Remarks.

## Objectives

At the end of the course students should

• be able to convert simple decimal numbers to and from IEEE floating-point format, and to perform IEEE arithmetic on them;

• be able to identify problems with floating-point implementations of simple mathematical problems and know when incorrect solution is likely;

• be familiar with several key algorithms from the history of numerical analysis;

• decide how and when computation energy should be traded for accuracy;

• know to use a professionally-written package whenever possible (and still to treat claims of accuracy with suspicion).