Course pages 2015–16
No. of lectures: 11
Suggested hours of supervisions: 3 to 4
This course is useful for the Part II courses Advanced Graphics and Digital Signal Processing.
This course provides:
- an introduction to (IEEE) floating-point data representation and arithmetic;
- illustrations of how naïve implementations of obvious mathematics can go badly wrong;
- a study of several standard numerical algorithms.
An overall implicit aim is to encourage caution when using any floating-point value produced by a computer program. A variety of code fragments are provided and most are available in multiple languages. Students are strongly encouraged to experiment with these fragments.
- Integer and floating-point representation and arithmetic. Signed and unsigned integers and fixed-point; arithmetic, saturating arithmetic. Long division and multiplication. Floating point I/O in ASCII. What numbers are exactly representable in bases 2 and 10. Accuracy in terms of significant figures.
- IEEE floating-point arithmetic. Floating-point arithmetic, and the IEEE requirements. IEEE 754/854 floating point (32 and 64 bit); zeros, infinities, NaN. Overflow, underflow, progressive loss of significance. Rounding modes. Floating-point arithmetic is non-associative, and mathematical equivalences fail. Nonsensical results, e.g. sin(1e40). Difficulty in obtaining IEEE-quality in libraries.
- How floating-point computations diverge from real-number calculations. Absolute Error, Relative Error, Machine epsilon, Unit in Last Place (ulp). Finite computation: solving a quadratic. Summing a finite series. Rounding (round-off) and truncation (discretisation) error. Numerical differentiation; determining a good step size.
- Iteration and when to stop. Unbounded computation may produce unbounded errors. Solving equations by iteration and comparison to terminate it. Newton’s method. Order of convergence. Limit cycles. Why summing a Taylor series is problematic. Condition number, partial derivatives, backwards stability and chaos.
- Matrix Form Simultaneous Equations. Gaussian Elimination. Stability and pivoting improvements. Positive-definite. L/U and Cholesky decompositions. Doolittle/Crout method.
- Efficient Implementation Chebychev orthogonal basis (for power series economisation) Practical implementation of scientific (trig/log) functions. Comparison of Taylor, Chebychev and Cordic.
- Finite-Difference Time-Domain Simulation. Numerical simulation of SHM, charge/discharge, waves and other various examples (such as a Moniac Simulator).
- Circuit/Flow Analysis. Using a matrix representation of a linear flow circuit (water, electricity etc) to find steady state. Extensions for non-linear and time-varying branches as used by SPICE.
- Adaptive Methods and Custom Encodings A subset of the following topics will we be lectured. Details in handout. Arbitrary precision floating point, adaptive floating point, interval arithmetic. Rounding errors in PCM. Logarithmic and other non-linear representations. Their use in a-posteriori decision algorithms. Eg for rapid multiplication in Viterbi/Bayes and specialist ALUs (e.g. for low-density parity). Simulated Annealing. Non-linear spatial quantisation.
At the end of the course students should
- be able to convert simple decimal numbers to and from IEEE floating-point format, and to perform IEEE arithmetic on them;
- be able to identify problems with floating-point implementations of simple mathematical problems and know when incorrect solution is likely;
- be familiar with several key algorithms from the history of numerical analysis;
- decide how and when computation energy should be traded for accuracy;
- know to use a professionally-written package whenever possible (and still to treat claims of accuracy with suspicion).
Overton, M.L. (2001). Numerical computing with IEEE floating point arithmetic. SIAM.
Further reading - goes far beyond the course
Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, vol. 23, pp. 5-48.