University of Cambridge, Computer Laboratory

Databases Ia Learners' Guide

The course official syllabus is on the web page Databases Ia. The syllabus page for all Computer Laboratory courses always contains a definitive summary of what is examinable.

Updates for Michaelmas 2024/25

Q. Is recursive SQL examinable? Some versions of the notes say it's examinable and some versions say they're not examinable.

A. The concrete syntax and detailed mechanisms for recursive SQL have varied a lot over the years or in different rDBMS and so are not examinable. However, the concepts of recursive specification and least-fixed points in declarative (rule based) systems run throughout Part I CST and the underlying concept of "a subset defined by the closure of a unary operator on a seed set" are examinable. That is the best way of thinking of recursive SQL. An example from Discrete Maths is the closure of the subset of the natural numbers {1, 3} under the operator "add two and then take the result mod 10", f(x) = (x + 2)%10.

Updates for Michaelmas 2023

The syllabus for 2023 had its order changed from previous years to reflect what was lectured the previous year under the previous lecturer.

Apart from a slight reorganisation of order, the only change in syllabus in 22/23 from the previous year is the (re-)introduction of the concept of normalised data schemas, where the content of a record should be dependent on the key it is stored under. There are various esoteric definitions of this concept, such as BCNF and 3NF and so on, whose names (not precise definitions) I may have given in passing, but their subtle differences are irrelevant and certainly not examinable.

FAQ

Q. I am writing to enquire if it is ok for us to use the Crows Foot notation for the tripos papers for cardinality instead of the one that was used for the examples in the lecture notes because the Crows Foot notation makes intuitive sense to me. Will we be marked down on it?
A. At degree level, credit should always be awarded for any correct answer, and even for a good argument that leads to an opinion not shared by the lecturer or marker. There should never be any possibility of being marked down for submitted work that is correct and good.
So do not worry about that sort of thing in general. Go ahead with crows foot notation if you like. If you used your own, esoteric notation, that would also be fine provided you also explained it clearly.
For this databases course, the only general condition is not to use recent SQL features, such as lists inside rDBMS fields, unless specifically invited by the question.

Q. There are two symbols on the E/R diagram at the end of Lecture 2 that you did not describe?
A. Yes, the value attribute for a sale is in a dotted outline denotes a derived attribute: one that can be computed, on-demand, from other attributes, so does not need to be stored. If there's a globally fixed commission rate, the sale price (value( can be determined arithmetically from the commission.
A. The attribute with an inner oval for the phone number denotes a 'multi-valued attribute'. It is helpful to store a list of alternative phone numbers (eg. home, work and mobile). This should be modelled in the E/R diagram. Note however, that in this course, an RDBMs schema that allows a list of values to be stored in a single field is banned: instead you are expected to make a cumbersome arrangement.

Q. With the new lecturer, will there be a change in 2023 and future years to the type of exam questions asked?
A. It is clear that Databases Ia questions in the last four years' (2019-22) have concentrated almost entirely on E/R modelling, the relational model and SQL. Although these topics can be expected to feature strongly in future questions, this limited coverage of the syllabus should never be interpreted as providing an exhaustive indication of scope. As always, the whole syllabus is examinable. 2021-03-01 asked about redundancy: the one change of syllabus for 2023 (mentioned in this document above) means that the effect of normal-form data on redundancy becomes examinable. In general, students should expect a change in 2023 onwards, with other aspects of the syllabus being examined more frequently.

Q. During your databases Ia lectures you briefly mentioned 'inverted indices'. How much depth do we need to know about this topic as I am struggling to find much about what precisely an inverted index is - I am aware of what an index is and how it can speed up record lookup but I am not sure what inverted means in this context.
A. The index to a data store is what is consulted when looking something up. Generally a logarithmic lookup time is achieved which is better than a linear scan through all keys. Given that a schema will define the primary key for a data store, the primary index is the index of those keys. An inverted index provides lookup via a different attribute or set of attributes. For instance, if a store of people has a primary index which is the artificial key 'pid' (a person identifier number), an inverted index might provide rapid lookup of people by name, returning for instance, a list of people called David Greaves. Again, the inverted index will aim to give a performance improvement with respect to a linear scan, but it will likely return more than one answer since only the primary key has the uniqueness property.

Q. How much of the Neo4j/graphical databases is precisely examinable?
The level of detail presented in lectures was fairly low, and this is the level expected for answering questions. Unlike with SQL, hands-on practical experience is not expected. You should know enough to describe the principle differences w.r.t. an rDBMS and answer questions on what might change in terms of redundancy or expressibility if data is stored in a graphical database instead of an rDBMS. The same remarks apply to the document model(s) lectured.

Q. How much of the hardware/new additions to the course is examinable (and where would you say these new additions are)?
A. The descriptions of hardware that I presented contain nothing that is not covered in more detail in the Operating Systems course. Because this is the first course in the academic year, and some students have very little idea of how operating systems and computer hardware work at the time they arrive, I felt the descriptions helped set the context. However, there is nothing examinable about them that you should not also know as part of your general knowledge or from the Operating Systems course. Computer Science is essentially a marriage of Engineering and Maths, being aware of how design decisions are likely to affect performance is examinable --- that is the Engineering aspect.
The ethos of the Databases course is logical organisation of data (not physical data organisation) and the logical concepts should be applicable over all sorts of computing device (optical, quantum, organic wet jelly etc). So although no questions directly relating to hardware or operating systems are likely to appear under the Databases monica at Tripos, complete ignorance of the efficiency aspect is never acceptable.

Q. If you were to provide a set of topics to focus on which would you say are most important?
Study the syllabus page. www.cl.cam.ac.uk/teaching/2223/Databases.

Q. I couldn't find anything on "shredding" online. As I understood it (from my project's use of ElasticSearch) "sharding" is about splitting the whole dataset between nodes, not splitting documents themselves.
A. Yes, sharding is the process of spreading out the document on to multiple servers to increase aggregate read bandwidth. If the same shard is stored on more than one server, which it often is, then this also provides redundancy (in the availability sense, not logical/schema redundancy). The terms 'shredding' and 'sharding' are in the recommended chapters of the recommended text book. Shredding is where the document is split up instead of being stored monolithically. The shredding,at least as I lectured it, must be reversible, which means that the index needs to preserve/record shred order as well as provide lookup.

Q. JOINs confuse some of the students (and some insist on using the ancient implicit join syntax "... FROM a, b, c WHERE ..."). They usually struggle with the differences between an INNER JOIN and an OUTER JOIN, and I am not sure this is in the tutorial, but I am sure it ought to be.
A. The difference between these joins mostly (entirely?) depends on the existence of NULL. In the lecture slides relating to null there is a footnote mentioning the names of the joins. They are told they should seek to understand the motivation for these different forms via private study. One the other hand, knowing the precise names and semantic of each form is beyond the examinable content of the course.

Q. What precisely constitutes the 'core subset' of SQL that we should be fluent with and what does fluency mean in this context? Previous years' had a fairly heavy focus on SQL and writing SQL in exams but there is a move away from that it seems - is there still a chance of writing SQL code in exams?
A. The SQL subset you should know and be able to use in answers to exam questions is all of the constructs appearing in the tutorials, lecture slides and mentioned in footnotes (eg. outer joins) to the lecture slides (unless specifically marked as unexaminable) and anything else that was necessary to complete the practical exercises. This subset consists of all the core constructs of the query part of the language that can easily be expressed using set theory or are related to the relational algebra. There is no change of examinable subset for 2023 compared with previous years. Hence schema definitions and update and maintenance SQL statements are not examinable. When SQL answers are needed, no marks will be lost for small syntax errors (as is the case in all Tripos exams, unless accurate syntax is specifically requested). Please also see the question from a supervisor above regarding inner and outer joins.

END. Updated May 2025.