Computer Laboratory

Course pages 2011–12

Databases

Principal lecturer: Dr Ken Moody
Taken by: Part IB
Past exam questions
Information for supervisors (contact lecturer for access permission)

No. of lectures: 12
Prerequisite courses: None

Aims

The overall aim of the course is to cover the fundamentals of database management systems (DBMSs), paying particular attention to relational database systems. The course covers modelling techniques, transferring designs to actual database implementations, SQL, models of query languages, transactions as well as more recent developments, including data warehouses and On-line Analytical Processing (OLAP), and use of XML as a data exchange language. The lectures will make use of the open source DBMS, MySQL.

Lectures

  • Introduction. What is a database system? Database systems are more than just a collection of data. Three level architecture. OnLine Transaction Processing (OLTP) versus OnLine Analytic Processing (OLAP).

  • The relational data model. Relations are sets of records. Representing entities and relationships as relations. Queries as derived relations. Relations are the basis of SQL (but note the use of multi-sets).

  • Entity-Relationship (E/R) modelling. A bit of set theory. Entities have attributes. Relations have arity. Database design and data modelling.

  • Relational algebra and relational calculus. Relational algebra as an abstract query language. Core operations - selection, projection, product, renaming, and joins. Relational calculus as an abstract query language that uses notation from set theory. Equivalence with relational algebra.

  • SQL and integrity constraints. An overview of the core of SQL. SQL has constructs taken from both the relational algebra and the relational calculus. Integrity constraints as special queries, often required to yield a null result.

  • Case Study - Cancer registry for the NHS - challenges. ECRIC is a cancer registry, recording details about all tumours in people in the East of England. This data is particularly sensitive, and its use is strictly controlled. The lecture focusses on the challenges of scaling up the registration system to cover all cancer patients in England, while still maintaining the long term accuracy and continuity of the data set.

  • Schema refinement I. The evils of redundancy. The benefits of redundancy. Functional dependencies (FDs) as a formal means of investigating redundancy. Relational decomposition. Armstrong’s axioms and Heath’s Rule.

  • Schema refinement II and Normal Forms. Schema normalisation. First and Second normal form. Third normal form and Boyce-Codd normal form. Multi-valued dependencies (MVDs) and lossless-join decomposition. Fourth normal form.

  • Schema refinement III and advanced design. General Decomposition Method (GDM). Decomposition examples. GDM always preserves lossless-join decomposition. GDM may not preserve functional dependencies. Weak entity sets. Ternary or multiple binary relationships?

  • On-line Analytical Processing (OLAP). When to forget about data normalisation. Beware of buzz-words and the Data Warehouse Death March. More on OLTP versus OLAP. What is a data cube? Data modelling for data warehouses: star schema.

  • Case Study - Cancer registry for the NHS - experiences. The extension of ECRIC to cover all of England requires the integration of data from seven other regions, each of which has developed its own database schema. Jem Rashbass has a long track record in NHS IT, and is now CEO of ECRIC, making him the DB customer. He will explain what’s needed and why - some of the existing challenges and future opportunities. The session will close with an open forum in which the DBA of the now national level Cancer Registry DBMS will join Jem.

  • XML as a data exchange format. What is XML? XML can be used to share data between proprietary relational databases. XML-based databases?

Objectives

At the end of the course students should

  • be able to design entity-relationship diagrams to represent simple database application scenarios;

  • know how to convert entity-relationship diagrams to relational database schemas in the standard Normal Forms;

  • be able to program simple database applications in SQL;

  • understand the basic theory of the relational model and both its strengths and weaknesses;

  • be familiar with various recent trends in the database area.

Recommended reading

* Silberschatz, A., Korth, H.F. & Sudarshan, S. (2002). Database system concepts. McGraw-Hill (4th ed.).
Ullman, J. & Widom, J. (1997). A first course in database systems. Prentice Hall.
Date, C.J. (2004). An introduction to database systems. Addison-Wesley (8th ed.).
Miszczyk, J. and others (1998). Mastering data warehousing functions. (IBM Redbook DB2/400) Chapters 1 & 2 only. http://www.wminformatica.com/PDFs/DB2400-Mastering Data Warehousing.pdf
Garcia-Molina, H. Data warehousing and OLAP. Stanford University. http://www.cs.uh.edu/~ceick/6340/dw-olap.ppt