Computer Laboratory – Course material 2007–08: Floating-Point Computation

Next: Group Project Up: Michaelmas Term 2007: Part Previous: Elementary Use of the Contents

Floating-Point Computation

Lecturer: Professor A. Mycroft

No. of lectures: 4

This course is useful for the Part II courses Advanced Graphics and Digital Signal Processing.

Aims

This course has two aims: firstly to provide an introduction to (IEEE) floating-point data representation and arithmetic; and secondly to show, mainly by fun examples backed up by simple analysis, how naïve implementations of obvious mathematics can go badly wrong.

Lectures

IEEE Floating-point representation and arithmetic (32 and 64 bits). Overflow, underflow, progressive loss of significance. Rounding modes.
How floating-point computations diverge from real-number calculations. Absolute Error, Relative Error, Machine epsilon. Solving a quadratic.
Iteration and when to stop. Why summing a Taylor series is problematic (loss of all precision, range reduction, non-examinable hint at economisation).
Ill-conditioned or chaotic problems. Testing. Packages. Non-examinable: exact real arithmetic.

Objectives

At the end of the course students should

be able to convert simple decimal numbers to and from IEEE floating-point format, and to perform simple arithmetic
be able to identify problems with floating-point implementations of simple mathematical problems
know when a problem is likely to yield incorrect solutions no matter how it is processed numerically
know to use a professional package whenever possible

Recommended reading

None.

Next: Group Project Up: Michaelmas Term 2007: Part Previous: Elementary Use of the Contents