



Next: Paper 2: Professional Practice Up: Lent Term 2009: Part Previous: Paper 2: Discrete Mathematics Contents
Paper 1: Floating-Point Computation
Lecturer: Professor A. Mycroft
No. of lectures: 6
This course is useful for the Part II courses Advanced Graphics and Digital Signal Processing.
Aims
This course has two aims: firstly to provide an introduction to (IEEE) floating-point data representation and arithmetic; and secondly to show, how naïve implementations of obvious mathematics can go badly wrong. An overall implicit aim is to encourage caution when using any floating-point value produced by a computer program.
Lectures
- Integer and floating-point representation and arithmetic.
Signed and unsigned integers and fixed-point; arithmetic, saturating arithmetic.
IEEE 754/854 floating point (32 and 64 bit); zeros, infinities, NaN.
Brief mention of IEEE 754r.
What numbers are exactly representable in bases 2 and 10.
Accuracy in terms of significant figures.
Floating point arithmetic is non-associative, and mathematical equivalences
fail.
Nonsensical results, e.g. sin(1e40), counting in floating point.
- IEEE Floating-point arithmetic.
Floating point arithmetic, and the IEEE requirements.
Why the IEEE standard has endured.
Overflow, underflow, progressive loss of significance.
Rounding modes.
Difficulty in obtaining IEEE-quality in libraries.
The java.lang.Math trigonometric library promises.
- How floating-point computations diverge from real-number calculations.
Absolute Error, Relative Error, Machine epsilon, Unit in Last Place (ulp).
Finite computation: solving a quadratic.
Summing a finite series.
Rounding (round-off) and truncation (discretisation) error.
Numerical differentiation; determining a good step size.
- Iteration and when to stop.
Unbounded computation may produce unbounded errors.
Solving equations by iteration and comparison to terminate it.
Newton's method.
Idea of order of convergence.
Why summing a Taylor series is problematic (loss of all precision,
range reduction, non-examinable hint at economisation).
- Ill-conditioned or chaotic problems.
Effect of changes of a few ulp in the inputs.
Conditioning number when amenable to mathematical analysis;
Monte-Carlo exploration when not.
- Other approaches and their problems
Adaptive methods.
Arbitrary precision floating point, adaptive floating point, interval
arithmetic.
Discussion on the problems of exact real arithmetic.
Remark on the x86 implementations of IEEE arithmetic, and compiler
``optimisations''.
Objectives
At the end of the course students should
- be able to convert simple decimal numbers to and from IEEE
floating-point format, and to perform IEEE arithmetic on them
- be able to identify problems with floating-point implementations of
simple mathematical problems
- know when a problem is likely to yield incorrect solutions no
matter how it is processed numerically
- know to use a professionally-written package whenever possible
(and still to treat claims of accuracy with suspicion)
Recommended reading
Overton, M.L. (2001). Numerical computing with IEEE floating point arithmetic. SIAM.
Further reading - goes far beyond the course
Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, vol. 23, pp. 5-48.




Next: Paper 2: Professional Practice Up: Lent Term 2009: Part Previous: Paper 2: Discrete Mathematics Contents