skip to primary navigationskip to content

Department of Computer Science and Technology

Databases

 

Course pages 2022–23

Databases

NB: Despite having a new lecturer this year, the changes from last year are all superficial: the changes in syllabus and lecture titles just reflect what was taught last year and are not really a change.

Practical Work & Assessed Exercises (ticks)

There are two Exercises for this course with associated Help-and-Tick sessions. Please go to the following web page and then add your own crsid to the end of the URL: Exercise Worksheets. The first Exercise uses SQL and HSQLDB. The second Exercise uses the DoctorWho document database.

For the first Exercise, you are expected to learn some basic SQL from the tutorial below or from elsewhere. For the second Exercise you need to know enough Python to understand the constructs used in the second tutorial.

The first Help-and-Tick session will be on the afternoon of 16th November and the second is one week later. Details are on the Exercise Worksheets link. [The practical classes mentioned in one of the on-line course diaries (and elsewhere) for week 3 and 4 do not exist and are replaced with the two Help-and-Tick sessions.]
Assessed Exercise submission deadlines are given in the Head Of Department's Notice www.cst.cam.ac.uk/teaching/exams/hod-notice/part-ia. It is important to have given in a substantially correct effort by the formal deadline, even if there are some small bits missing. You need not to have had the work marked and ticked off by the submission deadline.

The second Exercise worksheet is available through the same mechanism as the first.

Please be aware that work submitted for Assessed Exercises must be entirely your own work and that publishing or otherwise sharing answers to exercises risks you being penalised for "academic misconduct" if someone plagiarises your work.

Primary Lecture Materials

Note: as well as reading all the material on the Syllabus, as lectured, you are expected to learn a basic subset of SQL in private study. Please see the relational tutorial below. SQL today is a very big language with the original, tiny, core language still alive inside it. You are expected to become fluent with that core subset, not extending beyond anything lectured.

  • Lectures 1 to 4: slides PDF (84 pages).

  • Lectures 5 to 8: slides PDF (88 pages).

    Erratum: A slide in Lecture 8 (fixed now in rev C) previously said that deconstructing the parts of a field into sub-fields violated entity integrity. But that was the wrong term, sorry. I meant to say that splitting a field into sub-fields violated value integrity and that generally a better schema is preferable where the sub-fields are properly documented by being normal fields. It's also a somewhat academic point since regular expression searches of field contents and using other predicates on fields (eg SQL's in test) are very commonly used: a predicate applied to a field is not very different from extracting a sub-field as data.

  • Learners' Guide and FAQ: Learners Guide

  • Supervision Suggestions:

HSQLDB Practical Materials

We shall use an rDBMS called HSQLDB for some of the supervision exercises and Assessed Exercises. Note there are plenty of other SQL implementations online, including interactive forms-driven study aids. Feel free to use these for private experiments while exploring the language.

For data, we shall use a subset of the IMBD website data contained in the following snapshot: https://www.cl.cam.ac.uk/teaching/2122/Databases/movies-relational.zip.

Helpful JDK8/Linux Setup Note.

Helpful Windows setup note: please see the start of the Relational Tutorial.

Relational Tutorial LINK (html).

Document/Aggregate Database Materials

Document/aggregate database tutorial LINK (html).

Document/aggregate database data file imdb_doctorwho_database_v3.zip.

Example query template code (for the second Assessed Exercise) template.zip.

Secondary Materials

Peter Chen's E/R Modelling Paper (PDF).

Are Nulls Evil - A Discussion PDF.

A few extra slides that illustrate outer joins and also generate the "Different key nestings of (semi-)structured data" slide in Lecture 6 LINK (pdf).

Web article illustrating equi, inner and outer joins Natural vs Equijoin. See Lemahieu 7.3.1.5 too.

Example Xpath code, embedded in Python, for extracting trivia fields from IMDB scapings xpath-example.zip.

Have a quick browse through the Neo4j Cypher language reference manual

Please also browse last year's secondary material.

Finally, as a hardware person, I do like to refer to hardware and efficiency quite a lot. For those of you with no computer hardware knowledge, I recommend all of Video 1 and the middle third of Video 4 on this LINK. This mirrors content on the pre-arrival course.

  • Graph-oriented DBMS (No tick this year!)
  • A few "NoSQL" pointers

    Further reading


    END