**Next:**Group Project

**Up:**Michaelmas Term 2007: Part

**Previous:**ECAD

**Contents**

##

Floating-Point Computation

*Lecturer: Professor A. Mycroft*

*No. of lectures:* 4

*This course is useful for the Part II courses Advanced Graphics and Digital Signal Processing.*

**Aims**

This course has two aims: firstly to provide an introduction to (IEEE) floating-point data representation and arithmetic; and secondly to show, mainly by fun examples backed up by simple analysis, how naïve implementations of obvious mathematics can go badly wrong.

**Lectures**

**IEEE Floating-point representation and arithmetic (32 and 64 bits).**Overflow, underflow, progressive loss of significance. Rounding modes.**How floating-point computations diverge from real-number calculations.**Absolute Error, Relative Error, Machine epsilon. Solving a quadratic.**Iteration and when to stop.**Why summing a Taylor series is problematic (loss of all precision, range reduction, non-examinable hint at economisation).**Ill-conditioned or chaotic problems. Testing. Packages.**Non-examinable: exact real arithmetic.

**Objectives**

At the end of the course students should

- be able to convert simple decimal numbers to and from IEEE floating-point format, and to perform simple arithmetic
- be able to identify problems with floating-point implementations of simple mathematical problems
- know when a problem is likely to yield incorrect solutions no matter how it is processed numerically
- know to use a professional package whenever possible

**Recommended reading**

None.

**Next:**Group Project

**Up:**Michaelmas Term 2007: Part

**Previous:**ECAD

**Contents**