Relational databases: Getting started

This tutorial will quickly get you started with our relational database. You will learn more if you run each of these queries yourself and experiment with your own variations. Once you have done this you will be ready to solve the practical exercises.

The database used in this course contains information about movies and actors, taken from the Internet Movie Database (IMDb). We have structured the data into a relational form, which is described in detail in the schema documentation.

We use SQL as query language. SQL is a vast language, and this tutorial teaches you a small subset through examples. However, we do try to cover the most commonly used constructs in the query language.

The database management system is HyperSQL Database (HSQLDB), which has the advantage of being quite easy to set up. The things you will learn in these exercises apply equally to more common databases you might have heard of, such as MySQL or PostgreSQL.

If you run into any problems, please post questions on the course's Moodle page.

Setting it up

You can use your own computer, or a college or university computer with Java installed. The examples should work equally well on Linux, Windows or Mac OS.

You need to download two files: one containing the database program and another containing the data. Download the database program code (relational-db.jar) and the database (relational-db.zip), and put them in the same directory. Unpack the zip file, which should give you a subdirectory called relational-db. (There should be only one directory called relational-db — not a directory inside a directory with the same name.) Now you are ready to explore the data.

In your terminal, change to the directory containing the downloaded JAR file, and run it as follows:

java -jar relational-db.jar relational-db

The last part of that command, relational-db, is the name of the subdirectory containing the data files. When you run it, a HSQL Database Manager window should pop up. On the left is a list of tables (called public.certificates and so on). At the top is a box where you can type a query, and at the bottom is an area where the results will appear.

Looking at a single table

Each movie in the database is identified by a unique number, its id. For example, the id of the latest Star Wars movie is 4148060. If you know that ID, you can look up information about the movie.

Type the following query in the top-right box, and click the Execute SQL button to run it:

SELECT * FROM movies
WHERE id = 4148060;

It should produce a result looking like this:

     ID  TITLE                                              YEAR
-------  -------------------------------------------------  ----
4148060  Star Wars: Episode VII - The Force Awakens (2015)  2015

The * here specifies that every column from the table will be printed. If we wanted only the title and year columns we could write:

SELECT title, year
FROM movies
WHERE id = 4148060;

This query produces the output:

TITLE                                              YEAR
-------------------------------------------------  ----
Star Wars: Episode VII - The Force Awakens (2015)  2015

SQL has many keywords. We have used automation to highlight keywords in green. Our column name year is also a keyword and the database system rather permissively allows this without complaint. Other systems may behave differently. Sigh. Most systems today are not case sensitive but for some (strange) reason it has become a tradition to write SQL's keywords using CAPITALS. Longer sigh.

If you don't already know the ID and are not even sure of the title, then you might search using a string match:

SELECT * FROM movies WHERE title LIKE '%Star%';

This query produces the output:

     ID  TITLE                                              YEAR
-------  -------------------------------------------------  ----
3149936  Afghan Star (2009)                                 2009
4147697  Star Trek (2009)                                   2009
4148060  Star Wars: Episode VII - The Force Awakens (2015)  2015
4377019  Twenty Feet from Stardom (2013)                    2013

The LIKE operator searches for strings that match some pattern, and the percent sign % means "any sub-string can appear here". Thus, title LIKE '%Star%' means searching for movies whose title contains the word Star as a sub-string.

If you don't want to search, you can list movies in alphabetical order, for example show the first five:

SELECT * FROM movies ORDER BY title LIMIT 5;

which looks like this:

     ID  TITLE                                 YEAR
-------  ------------------------------------  ----
3099893  12 Years a Slave (2013)               2013
3108852  4 luni, 3 saptamâni si 2 zile (2007)  2007
3149936  Afghan Star (2009)                    2009
3171436  American Hustle (2013)                2013
3175199  An Education (2009)                   2009

If you leave off the LIMIT 5 and don't have any WHERE clause, you get back all the movies in the database. If you leave off the ORDER BY title, you get back the results in some arbitrary, undefined order.

We can count movies per year using the GROUP BY construct:

SELECT year, count(*) AS total
FROM movies
GROUP BY year
ORDER BY year;

This query produces the output:

YEAR  TOTAL
----  -----
2004      1
2006      9
2007     10
2008     12
2009     10
2010      9
2011     10
2012     10
2013     11
2014     10
2015      8
2016     12
2017      8

The GROUP BY year construct will group together all records in the movies table that have the same value for the year attribute. The count(*) AS total specifies that the records in each group are to be counted, and that this count is associated with the new column named total. Here ORDER BY year will see that year is an integer value and then use the standard numeric order. If we wanted to order the list in the reverse order we could use ORDER BY year DESC as follows:

SELECT year, count(*) AS total
FROM movies
GROUP BY year
ORDER BY year DESC;

This query produces the output:

YEAR  TOTAL
----  -----
2017      8
2016     12
2015      8
2014     10
2013     11
2012     10
2011     10
2010      9
2009     10
2008     12
2007     10
2006      9
2004      1

Our collection of 120 movies in the database corresponds to the top ten movies per year, from 2006 through 2017, according to the web site Rotten Tomatoes. We can see from this query that IMDb and Rotten Tomatoes do not always agree on the year associated with a movie!

Now experiment with your own queries on the people table.

JOIN: Combining records from multiple tables

There are many tables in our database that provide additional information about each movie. Those tables are called certificates, keywords, languages, genres, locations, and release_dates. (For more information see schema documentation.) Each of these tables contains records with a movie_id together with some additional values.

Let's take a peek at the languages table:

SELECT * FROM languages LIMIT 10;

This query produces the output:

MOVIE_ID  LANGUAGE  NOTE
--------  --------  ------------------
 3099893  English   [null]
 3108852  Romanian  [null]
 3149936  Dari      [null]
 3149936  English   [null]
 3149936  Pashtu    [null]
 3171436  Arabic    [null]
 3171436  English   [null]
 3175199  English   [null]
 3175199  French    (only a few lines)
 3187610  English   [null]

(Note: This SELECT is not constrained with an ORDER BY construct, so no particular order is guaranteed. You may obtain a different collection of 10 rows because of this non-determinism.)

Now suppose we want to write a query that returns new records of the form title, year, language — that is, we want to join together information from the movies table and the languages table (the title and year come from the movies table, but the language comes from the languages table). We can link the records from the two tables based on the movie ID: if the number in the movies.id column matches the number in languages.movie_id, then the two records are about the same movie. We can link these records using the JOIN construct:

SELECT title, year, language
FROM movies
JOIN languages ON movies.id = languages.movie_id
ORDER BY year, title
LIMIT 20 ;

This query produces the output (if your system produced a slightly different output can you explain why?):

TITLE                          YEAR  LANGUAGE
-----------------------------  ----  ---------
Kekexili (2004)                2004  Tibetan
Kekexili (2004)                2004  Mandarin
Blindsight (2006/I)            2006  English
Blindsight (2006/I)            2006  German
Blindsight (2006/I)            2006  Tibetan
Casino Royale (2006)           2006  English
Casino Royale (2006)           2006  French
Children of Men (2006)         2006  English
Children of Men (2006)         2006  Arabic
Children of Men (2006)         2006  German
Children of Men (2006)         2006  Italian
Children of Men (2006)         2006  Romanian
Children of Men (2006)         2006  Russian
Children of Men (2006)         2006  Serbian
Children of Men (2006)         2006  Spanish
Children of Men (2006)         2006  Georgian
Deliver Us from Evil (2006)    2006  English
El laberinto del fauno (2006)  2006  Spanish
The Departed (2006)            2006  English
The Departed (2006)            2006  Cantonese

There is an alternative way of writing such a query, which you will see in many older textbooks:

SELECT title, year, language
FROM movies, languages
WHERE movies.id = languages.movie_id
ORDER BY year, title
LIMIT 20 ;

This query produces the same output. However, here the JOIN is implicit in the way we have written the SELECT ... FROM ... WHERE ... expression. It is generally better practice to make joins explicit since it often makes queries easier to understand — the join logic is disentangled from the filter logic of the WHERE clause. Another reason is that (as we see below) there are several variations of the join construct, each having subtly different semantics, and when joins are explicit we can experiment with these variants.

Now that we see that some movies are multi-lingual, let's rank movies by the number of languages associated with each movie. We can combine the JOIN and GROUP BY constructions:

SELECT title, year, count(*) AS language_count
FROM movies
JOIN languages ON movies.id = languages.movie_id
GROUP BY title, year
ORDER BY language_count desc
LIMIT 20 ;

This query produces the output:

TITLE                                        YEAR  LANGUAGE_COUNT
-------------------------------------------  ----  --------------
Children of Men (2006)                       2006               9
District 9 (2009)                            2009               6
Mission: Impossible - Ghost Protocol (2011)  2011               5
The Bourne Ultimatum (2007)                  2007               5
Iron Man (2008)                              2008               5
Snowpiercer (2013)                           2013               5
Persepolis (2007)                            2007               4
Drag Me to Hell (2009)                       2009               4
The Avengers (2012)                          2012               3
The Artist (2011/I)                          2011               3
Slumdog Millionaire (2008)                   2008               3
Un prophète (2009)                           2009               3
Dunkirk (2017)                               2017               3
Brooklyn (2015)                              2015               3
The Queen (2006)                             2006               3
Argo (2012)                                  2012               3
Blindsight (2006/I)                          2006               3
Bikur Ha-Tizmoret (2007)                     2007               3
Before Midnight (2013)                       2013               3
Afghan Star (2009)                           2009               3

More on JOIN

The credits table is special in our database. It represents a many-to-many relationship between movies and people. Each record contains a movie_id and a person_id attribute. Thus each record represents a relationship between the person with that person_id and the movie with that movie_id. The additional attributes of the credits table tell us something about that relationship. Perhaps the most interesting attribute is type which tells us what kind of relationship is being recorded. Let's look at the different types:

SELECT type, count(*) AS total
FROM credits
GROUP BY type
ORDER BY total DESC;

This query produces the output:

TYPE              TOTAL
----------------  -----
actor              8467
producer           1108
writer              289
editor              161
director            139
cinematographer     131
composer            125
costume_designer     90

In other words, the credits table represents eight distinct types of relationships between movies and people. As will be discussed in the lectures, this table is not in normal form. An alternative schema might split it into eight distinct tables, one for each type.

Let's construct our first three-way join! We will construct a table with attributes movie_id, title, person_id, name, type, character for the movie with title Gui tu lie che (2009):

SELECT movie_id, title, person_id, name, type, character
FROM movies
JOIN credits ON movies.id = credits.movie_id
JOIN people  ON people.id = credits.person_id
WHERE title = 'Gui tu lie che (2009)'
ORDER BY type ;

This query produces the output:

MOVIE_ID  TITLE                  PERSON_ID  NAME                  TYPE             CHARACTER
--------  ---------------------  ---------  --------------------  ---------------  ---------
 3558459  Gui tu lie che (2009)    2635369  Zhang, Changhua       actor            Himself
 3558459  Gui tu lie che (2009)    2331770  Tang, Tingsui         actor            Himself
 3558459  Gui tu lie che (2009)    2636131  Zhang, Yang (VII)     actor            Himself
 3558459  Gui tu lie che (2009)    4096718  Zhang, Qin (I)        actor            Herself
 3558459  Gui tu lie che (2009)    2877417  Chen, Suqin           actor            Herself
 3558459  Gui tu lie che (2009)     714378  Fan, Lixin (II)       cinematographer  [null]
 3558459  Gui tu lie che (2009)    4107663  Alary, Olivier        composer         [null]
 3558459  Gui tu lie che (2009)     714378  Fan, Lixin (II)       director         [null]
 3558459  Gui tu lie che (2009)     714378  Fan, Lixin (II)       editor           [null]
 3558459  Gui tu lie che (2009)    3898468  Stephen, Mary         editor           [null]
 3558459  Gui tu lie che (2009)     403772  Chang, Yung (I)       editor           [null]
 3558459  Gui tu lie che (2009)    4144700  Cross, Daniel (II)    producer         [null]
 3558459  Gui tu lie che (2009)    4113686  Aung-Thwin, Mila (I)  producer         [null]
 3558459  Gui tu lie che (2009)    4705097  Zhao, Qi (VI)         producer         [null]
 3558459  Gui tu lie che (2009)    1632179  Moore, Bob (IV)       producer         [null]

Note that the query qualifies id as movies.id and people.id. This is required since both tables share a column with the name id, and SQL needs to know which one we are referring to! On the other hand, the movie_id and person_id columns are unambiguous, so you can omit the credits. prefix if you want.

Now let's construct a similar query to collect all information on the person Ben Affleck:

SELECT person_id, name, movie_id, title, type, character
FROM movies
JOIN credits ON movies.id = credits.movie_id
JOIN people  ON people.id = credits.person_id
WHERE name = 'Affleck, Ben'
ORDER BY type ;

which produces the output:

PERSON_ID  NAME          MOVIE_ID  TITLE            TYPE      CHARACTER
---------  ------------  --------  ---------------  --------  -----------
    18577  Affleck, Ben   3192514  Argo (2012)      actor     Tony Mendez
    18577  Affleck, Ben   4321985  The Town (2010)  actor     Doug MacRay
    18577  Affleck, Ben   3192514  Argo (2012)      director  [null]
    18577  Affleck, Ben   4321985  The Town (2010)  director  [null]
    18577  Affleck, Ben   3192514  Argo (2012)      producer  [null]
    18577  Affleck, Ben   4321985  The Town (2010)  writer    [null]

A joke: A SQL query walks into a bar and sees two tables. She walks up to them and asks "Can I join you?"

Self JOINs

Suppose we want to write a query that produces triples of the form genre1, genre2, total that count the number of movies associated with a pair of distinct genres. Since we need to join the genres table with itself, some means of differentiating the two instances of this table is required. The keyword AS comes to our rescue:

SELECT G1.genre AS genre1, G2.genre AS genre2, count(*) AS total
FROM genres AS G1
JOIN genres AS G2 ON G1.movie_id = G2.movie_id
WHERE G1.genre <> G2.genre
GROUP BY genre1, genre2
ORDER BY total desc
LIMIT 10;

This produces the output:

GENRE1     GENRE2     TOTAL
---------  ---------  -----
Thriller   Drama         16
Drama      Thriller      16
Drama      Romance       14
Romance    Drama         14
Biography  Drama         13
Action     Thriller      13
Thriller   Action        13
Drama      Biography     13
Sci-Fi     Action        12
Adventure  Action        12

Notice that there are two instances of every pair of genres. We can easily fix this by replacing the <> operator (which means “not equal”) with the < operator (“less than”), and thus ensure that the first genre is lexicographically less than the second:

SELECT G1.genre AS genre1, G2.genre AS genre2, count(*) AS total
FROM genres AS G1
JOIN genres AS G2 ON G1.movie_id = G2.movie_id
WHERE G1.genre < G2.genre
GROUP BY genre1, genre2
ORDER BY total desc
LIMIT 10;

This produces the output:

GENRE1     GENRE2     TOTAL
---------  ---------  -----
Drama      Thriller      16
Drama      Romance       14
Biography  Drama         13
Action     Thriller      13
Action     Adventure     12
Action     Sci-Fi        12
Adventure  Family        12
Comedy     Drama         11
Adventure  Comedy        11
Animation  Comedy        11

Nested Queries

As we saw previously, movies can be associated with multiple languages. Suppose that we want to list the titles of movies that are not associated with the English language. One way to do this is to use a nested query (also called a sub-query) and the IN construct:

SELECT title
FROM movies
WHERE id NOT IN (
   SELECT movie_id
   FROM languages
   WHERE language = 'English' );

This produces the output:

4 luni, 3 saptamâni si 2 zile (2007)
Aruitemo aruitemo (2008)
El laberinto del fauno (2006)
Gui tu lie che (2009)
Jiro Dreams of Sushi (2011)
Jodaeiye Nader az Simin (2011)
Kekexili (2004)
Le Havre (2011)
Leviafan (2014)
Låt den rätte komma in (2008)
Ma vie de Courgette (2016)
Shaun the Sheep Movie (2015)
Shi (2010)
Un prophète (2009)

The expression id NOT IN (...) is the same as NOT (id IN (...)) and will be true only when the value associated with id is not in the result of evaluating the (...) sub-query.

About NULL values

From above we can see that Ben Affleck's person_id is 18577. Let's look at all rows of credits with this person_id:

SELECT * FROM credits WHERE person_id = 18577;

which produces the output:

PERSON_ID  MOVIE_ID  TYPE      NOTE          CHARACTER    POSITION  LINE_ORDER  GROUP_ORDER  SUBGROUP_ORDER
---------  --------  --------  ------------  -----------  --------  ----------  -----------  --------------
    18577   3192514  actor     [null]        Tony Mendez         1
    18577   3192514  director  [null]        [null]
    18577   3192514  producer  (producer)    [null]
    18577   4321985  actor     [null]        Doug MacRay         1
    18577   4321985  director  [null]        [null]
    18577   4321985  writer    (screenplay)  [null]                          1            2               1

We can see that for some rows there are no values displayed for some columns. For example, for those rows where type is director there is no integer value displayed in the position column. However, there actually is a value associated with these blank spots, and that is the NULL value (a user interface may or may not display this value). In a similar way, when type is not actor the character column contains a null value. Note that in this case our user-interface displays [null] instead of whitespace. Why? That's because the type of this column is string, and this display wants us to be able to distinguish between a NULL value and the empty string!

Let's try to select just those rows position is NULL as follows:

SELECT * FROM credits WHERE person_id = 18577 AND position = NULL;

This query will return no rows at all! Why? Because with the NULL we enter the world of 3-valued logic. For two expressions e1 and e2 the expression e1 = e2 can evaluate to TRUE or FALSE or NULL (which can be read as we don't know the answer). Equality returns NULL when either e1 or e2 evaluates to NULL. (NULL is the Beyoncé of databases – nothing can be compared to it.) For example, position = NULL always returns NULL. In addition, TRUE AND NULL returns NULL as well. Furthermore, the SELECT ... WHERE ... construct only returns results for cases when the where-clause evaluates to TRUE.

SQL has a special construct to check for NULL values:

SELECT * FROM credits WHERE person_id = 18577 AND position IS NULL;

which does produce the expected output:

PERSON_ID  MOVIE_ID  TYPE      NOTE          CHARACTER  POSITION  LINE_ORDER  GROUP_ORDER  SUBGROUP_ORDER
---------  --------  --------  ------------  ---------  --------  ----------  -----------  --------------
    18577   3192514  director  [null]        [null]
    18577   3192514  producer  (producer)    [null]
    18577   4321985  director  [null]        [null]
    18577   4321985  writer    (screenplay)  [null]                        1            2               1

Outer JOINs

Suppose we have a tables T1 and T2 and we want to compute a new kind of JOIN that contains all of the rows of T1, even if there is no matching row in the T2 table. But what value should the query return for all T2 columns when a T1 row has no joining rows in the T2 table? The LEFT JOIN operation fills these columns with NULL values.

LEFT JOIN is also known as LEFT OUTER JOIN. Generally, outer joins may return results for rows that were not matched, whereas normal joins (also known as inner joins) only return matching records.

Let's look at an example.

One of the queries above tells us that there are 90 entries in the credits table for costume designers. Therefore we can confidently predict that the following query will return 90 rows:

SELECT title, person_id
FROM movies
JOIN credits ON movie_id = id AND type = 'costume_designer';

Indeed, it does:

TITLE                                                PERSON_ID
---------------------------------------------------  ---------
12 Years a Slave (2013)                                3606978
4 luni, 3 saptamâni si 2 zile (2007)                   5037824
American Hustle (2013)                                 2558260
An Education (2009)                                    5029750
Argo (2012)                                            4044743
Arrival (2016/II)                                      5021377
Aruitemo aruitemo (2008)                               3357344
Baby Driver (2017)                                     4560947
Before Midnight (2013)                                 3776755
Bikur Ha-Tizmoret (2007)                                100356
Boyhood (2014/I)                                       5050258
Brooklyn (2015)                                        5029750
Casino Royale (2006)                                   3197500
Children of Men (2006)                                 3941839
Dallas Buyers Club (2013)                              1298511
District 9 (2009)                                      2888744
Drag Me to Hell (2009)                                 3573667
Dunkirk (2017)                                         1298031
El laberinto del fauno (2006)                          5037272
Frost/Nixon (2008)                                     1772486
Get Out (2017/I)                                       5035269
Gloria (2013/II)                                        385642
Gravity (2013)                                         3941839
Harry Potter and the Deathly Hallows: Part 2 (2011)    3941839
Hell or High Water (2016/II)                           5059702
Her (2013)                                             2284973
In the Loop (2009)                                     5042904
Inside Llewyn Davis (2013)                             4100410
Iron Man (2008)                                        5034610
Iron Man (2008)                                        3838406
It Follows (2014)                                      5042261
Juno (2007)                                            5051445
La La Land (2016/I)                                    4100410
Le Havre (2011)                                         348581
Leviafan (2014)                                        5022641
Logan (2017)                                           1772486
Looper (2012)                                          2950086
Love & Friendship (2016)                               3522990
Låt den rätte komma in (2008)                          3908407
Ma vie de Courgette (2016)                             5034463
Ma vie de Courgette (2016)                             5052712
Mad Max: Fury Road (2015)                              2746864
Manchester by the Sea (2016)                           3964523
Milk (2008/I)                                           862446
Mission: Impossible - Ghost Protocol (2011)            1190879
Moneyball (2011)                                       4024522
Moonlight (2016/I)                                     5031222
Moonrise Kingdom (2012)                                4024522
Mr. Turner (2014)                                      3008293
Mud (2012)                                             5050258
Nightcrawler (2014)                                    5061837
No Country for Old Men (2007)                          4100410
Once (2007)                                            5027880
Room (2015/I)                                          5025909
Selma (2014)                                           2856180
Shi (2010)                                             5042067
Short Term 12 (2013)                                   2923520
Short Term 12 (2013)                                   5034288
Silver Linings Playbook (2012)                          286624
Skyfall (2012)                                          705532
Skyfall (2012)                                         3941839
Slumdog Millionaire (2008)                             5041701
Snowpiercer (2013)                                     5033561
Spider-Man: Homecoming (2017)                          5032765
Spotlight (2015/I)                                     2886785
Star Trek (2009)                                       1190879
Star Wars: Episode VII - The Force Awakens (2015)      1190879
The Artist (2011/I)                                     286624
The Avengers (2012)                                    2832963
The Babadook (2014)                                    4026481
The Big Sick (2017)                                    5025291
The Bourne Ultimatum (2007)                            2930437
The Cabin in the Woods (2012)                          3971106
The Dark Knight (2008)                                 3197500
The Departed (2006)                                    3696229
The Hurt Locker (2008)                                 1394582
The Jungle Book (2016)                                 3838406
The King's Speech (2010)                               2746864
The Martian (2015)                                     4081927
The Muppets (2011)                                     5020346
The Queen (2006)                                       2799107
The Social Network (2010)                              4044743
The Town (2010)                                        5044971
The Wrestler (2008)                                    5061837
True Grit (2010)                                       4100410
Un prophète (2009)                                     3548257
Up in the Air (2009/I)                                  862446
War for the Planet of the Apes (2017)                  2819305
Whiplash (2014)                                        5048444
Wonder Woman (2017)                                    3197500

Since we have 120 movies in our database we know that some movies have no costume designers recorded in our database. Let's rewrite this query using LEFT JOIN in order to see all movies, even those without costume designers:

SELECT title, person_id
FROM movies
LEFT JOIN credits ON movie_id = id AND type = 'costume_designer';

The result is:

TITLE                                                PERSON_ID
---------------------------------------------------  ---------
12 Years a Slave (2013)                                3606978
4 luni, 3 saptamâni si 2 zile (2007)                   5037824
Afghan Star (2009)
American Hustle (2013)                                 2558260
An Education (2009)                                    5029750
Anvil: The Story of Anvil (2008)
Argo (2012)                                            4044743
Arrival (2016/II)                                      5021377
Aruitemo aruitemo (2008)                               3357344
Baby Driver (2017)                                     4560947
Before Midnight (2013)                                 3776755
Bikur Ha-Tizmoret (2007)                                100356
Blindsight (2006/I)
Boyhood (2014/I)                                       5050258
Brooklyn (2015)                                        5029750
Casino Royale (2006)                                   3197500
Children of Men (2006)                                 3941839
Dallas Buyers Club (2013)                              1298511
Deliver Us from Evil (2006)
District 9 (2009)                                      2888744
Drag Me to Hell (2009)                                 3573667
Dunkirk (2017)                                         1298031
El laberinto del fauno (2006)                          5037272
Finding Dory (2016)
Frost/Nixon (2008)                                     1772486
Get Out (2017/I)                                       5035269
Gloria (2013/II)                                        385642
Gravity (2013)                                         3941839
Gui tu lie che (2009)
Harry Potter and the Deathly Hallows: Part 2 (2011)    3941839
Hell or High Water (2016/II)                           5059702
Her (2013)                                             2284973
How to Survive a Plague (2012)
How to Train Your Dragon (2010)
I Am Not Your Negro (2016)
In the Loop (2009)                                     5042904
Inside Job (2010)
Inside Llewyn Davis (2013)                             4100410
Inside Out (2015/I)
Iron Man (2008)                                        5034610
Iron Man (2008)                                        3838406
It Follows (2014)                                      5042261
Jiro Dreams of Sushi (2011)
Jodaeiye Nader az Simin (2011)
Juno (2007)                                            5051445
Kekexili (2004)
La La Land (2016/I)                                    4100410
Le Havre (2011)                                         348581
Leviafan (2014)                                        5022641
Life Itself (2014)
Logan (2017)                                           1772486
Looper (2012)                                          2950086
Love & Friendship (2016)                               3522990
Låt den rätte komma in (2008)                          3908407
Ma vie de Courgette (2016)                             5034463
Ma vie de Courgette (2016)                             5052712
Mad Max: Fury Road (2015)                              2746864
Man on Wire (2008)
Manchester by the Sea (2016)                           3964523
Milk (2008/I)                                           862446
Mission: Impossible - Ghost Protocol (2011)            1190879
Moana (2016/I)
Moneyball (2011)                                       4024522
Moonlight (2016/I)                                     5031222
Moonrise Kingdom (2012)                                4024522
Mr. Turner (2014)                                      3008293
Mud (2012)                                             5050258
Nightcrawler (2014)                                    5061837
No Country for Old Men (2007)                          4100410
Once (2007)                                            5027880
Persepolis (2007)
Project Nim (2011)
Ratatouille (2007)
Room (2015/I)                                          5025909
Selma (2014)                                           2856180
Shaun the Sheep Movie (2015)
Shi (2010)                                             5042067
Short Term 12 (2013)                                   2923520
Short Term 12 (2013)                                   5034288
Silver Linings Playbook (2012)                          286624
Skyfall (2012)                                          705532
Skyfall (2012)                                         3941839
Slumdog Millionaire (2008)                             5041701
Snowpiercer (2013)                                     5033561
Spider-Man: Homecoming (2017)                          5032765
Spotlight (2015/I)                                     2886785
Star Trek (2009)                                       1190879
Star Wars: Episode VII - The Force Awakens (2015)      1190879
Taxi to the Dark Side (2007)
The Artist (2011/I)                                     286624
The Avengers (2012)                                    2832963
The Babadook (2014)                                    4026481
The Big Sick (2017)                                    5025291
The Bourne Ultimatum (2007)                            2930437
The Cabin in the Woods (2012)                          3971106
The Dark Knight (2008)                                 3197500
The Departed (2006)                                    3696229
The Hurt Locker (2008)                                 1394582
The Interrupters (2011)
The Invisible War (2012)
The Jungle Book (2016)                                 3838406
The King of Kong (2007)
The King's Speech (2010)                               2746864
The LEGO Movie (2014)
The Martian (2015)                                     4081927
The Muppets (2011)                                     5020346
The Queen (2006)                                       2799107
The Social Network (2010)                              4044743
The Town (2010)                                        5044971
The War Tapes (2006)
The Wrestler (2008)                                    5061837
Toy Story 3 (2010)
True Grit (2010)                                       4100410
Twenty Feet from Stardom (2013)
Un prophète (2009)                                     3548257
Up (2009)
Up in the Air (2009/I)                                  862446
WALL·E (2008)
War for the Planet of the Apes (2017)                  2819305
Waste Land (2010)
Whiplash (2014)                                        5048444
Wonder Woman (2017)                                    3197500
Wordplay (2006)
Zootopia (2016)

This query will return 124 rows (because some movies have more than one costume designer). For the 34 movies that do not have a costume designer (90 + 34 = 124), the person_id column contains a NULL value.

We might expect that the following query will produce the same result:

SELECT title, person_id
FROM movies
LEFT JOIN credits ON movie_id = id
WHERE type = 'costume_designer';

However, this is not the case! This query will produce only 90 rows! In fact, it is semantically identical to the first example in this section. Why is this so? The where-clause is evaluated after the left join has been performed but before the SELECT has pulled out the relevant columns. At that point the type columns of non-matching rows contain a NULL value, and so do not produce TRUE for the WHERE type = 'costume_designer' clause. (Very audible sigh.)

Now suppose we want to replace every person_id with the name of the associated costume designer. Here is one way to rewrite the query:

SELECT title, name
FROM movies
LEFT JOIN credits ON movie_id = movies.id AND type = 'costume_designer'
LEFT JOIN people ON person_id = people.id;

Note that this query using another LEFT JOIN, which is evaluated after the first join but before the SELECT removes some columns. Here is the result:

TITLE                                                NAME
---------------------------------------------------  ------------------------
12 Years a Slave (2013)                              Norris, Patricia (I)
4 luni, 3 saptamâni si 2 zile (2007)                 Istrate, Dana
Afghan Star (2009)                                   [null]
American Hustle (2013)                               Wilkinson, Michael (I)
An Education (2009)                                  Dicks-Mireaux, Odile
Anvil: The Story of Anvil (2008)                     [null]
Argo (2012)                                          West, Jacqueline
Arrival (2016/II)                                    April, Renée
Aruitemo aruitemo (2008)                             Kurosawa, Kazuko (I)
Baby Driver (2017)                                   Hoffman, Courtney (I)
Before Midnight (2013)                               Rozana, Vasileia
Bikur Ha-Tizmoret (2007)                             Ashkenazi, Doron
Blindsight (2006/I)                                  [null]
Boyhood (2014/I)                                     Perkins, Kari
Brooklyn (2015)                                      Dicks-Mireaux, Odile
Casino Royale (2006)                                 Hemming, Lindy
Children of Men (2006)                               Temime, Jany
Dallas Buyers Club (2013)                            Kurt and Bart
Deliver Us from Evil (2006)                          [null]
District 9 (2009)                                    Cilliers, Diana
Drag Me to Hell (2009)                               Mussenden, Isis
Dunkirk (2017)                                       Kurland, Jeffrey
El laberinto del fauno (2006)                        Huete, Lala
Finding Dory (2016)                                  [null]
Frost/Nixon (2008)                                   Orlandi, Daniel
Get Out (2017/I)                                     Haders, Nadine
Gloria (2013/II)                                     Castro, Eduardo (I)
Gravity (2013)                                       Temime, Jany
Gui tu lie che (2009)                                [null]
Harry Potter and the Deathly Hallows: Part 2 (2011)  Temime, Jany
Hell or High Water (2016/II)                         Turzanska, Malgosia
Her (2013)                                           Storm, Casey
How to Survive a Plague (2012)                       [null]
How to Train Your Dragon (2010)                      [null]
I Am Not Your Negro (2016)                           [null]
In the Loop (2009)                                   Little, Ros
Inside Job (2010)                                    [null]
Inside Llewyn Davis (2013)                           Zophres, Mary
Inside Out (2015/I)                                  [null]
Iron Man (2008)                                      Gregg, Rebecca
Iron Man (2008)                                      Shannon, Laura Jean
It Follows (2014)                                    Leitz, Kimberly
Jiro Dreams of Sushi (2011)                          [null]
Jodaeiye Nader az Simin (2011)                       [null]
Juno (2007)                                          Prudhomme, Monique
Kekexili (2004)                                      [null]
La La Land (2016/I)                                  Zophres, Mary
Le Havre (2011)                                      Cambier, Frédéric
Leviafan (2014)                                      Bartuli, Anna
Life Itself (2014)                                   [null]
Logan (2017)                                         Orlandi, Daniel
Looper (2012)                                        Davis, Sharen
Love & Friendship (2016)                             Mhaoldomhnaigh, Eimer Ni
Låt den rätte komma in (2008)                        Strid, Maria
Ma vie de Courgette (2016)                           Grandchamp, Christel
Ma vie de Courgette (2016)                           Riera, Vanessa
Mad Max: Fury Road (2015)                            Beavan, Jenny (I)
Man on Wire (2008)                                   [null]
Manchester by the Sea (2016)                         Toth, Melissa
Milk (2008/I)                                        Glicker, Danny
Mission: Impossible - Ghost Protocol (2011)          Kaplan, Michael (I)
Moana (2016/I)                                       [null]
Moneyball (2011)                                     Walicka-Maimone, Kasia
Moonlight (2016/I)                                   Eselin, Caroline
Moonrise Kingdom (2012)                              Walicka-Maimone, Kasia
Mr. Turner (2014)                                    Durran, Jacqueline
Mud (2012)                                           Perkins, Kari
Nightcrawler (2014)                                  Westcott, Amy (I)
No Country for Old Men (2007)                        Zophres, Mary
Once (2007)                                          Corvisieri, Tiziana
Persepolis (2007)                                    [null]
Project Nim (2011)                                   [null]
Ratatouille (2007)                                   [null]
Room (2015/I)                                        Carlson, Lea
Selma (2014)                                         Carter, Ruth E.
Shaun the Sheep Movie (2015)                         [null]
Shi (2010)                                           Lee, Choong-yeon
Short Term 12 (2013)                                 Cretton, Joy
Short Term 12 (2013)                                 Gordon-Crozier, Mirren
Silver Linings Playbook (2012)                       Bridges, Mark (I)
Skyfall (2012)                                       Everest, Timothy
Skyfall (2012)                                       Temime, Jany
Slumdog Millionaire (2008)                           Larlarb, Suttirat Anne
Snowpiercer (2013)                                   George, Catherine (II)
Spider-Man: Homecoming (2017)                        Frogley, Louise
Spotlight (2015/I)                                   Chuck, Wendy
Star Trek (2009)                                     Kaplan, Michael (I)
Star Wars: Episode VII - The Force Awakens (2015)    Kaplan, Michael (I)
Taxi to the Dark Side (2007)                         [null]
The Artist (2011/I)                                  Bridges, Mark (I)
The Avengers (2012)                                  Byrne, Alexandra (I)
The Babadook (2014)                                  Wallace, Heather (I)
The Big Sick (2017)                                  Burton, Sarah Mae
The Bourne Ultimatum (2007)                          Cunliffe, Shay
The Cabin in the Woods (2012)                        Trpcic, Shawna
The Dark Knight (2008)                               Hemming, Lindy
The Departed (2006)                                  Powell, Sandy (II)
The Hurt Locker (2008)                               Little, George L. (I)
The Interrupters (2011)                              [null]
The Invisible War (2012)                             [null]
The Jungle Book (2016)                               Shannon, Laura Jean
The King of Kong (2007)                              [null]
The King's Speech (2010)                             Beavan, Jenny (I)
The LEGO Movie (2014)                                [null]
The Martian (2015)                                   Yates, Janty
The Muppets (2011)                                   Afiley, Rahel
The Queen (2006)                                     Boyle, Consolata
The Social Network (2010)                            West, Jacqueline
The Town (2010)                                      Matheson, Susan
The War Tapes (2006)                                 [null]
The Wrestler (2008)                                  Westcott, Amy (I)
Toy Story 3 (2010)                                   [null]
True Grit (2010)                                     Zophres, Mary
Twenty Feet from Stardom (2013)                      [null]
Un prophète (2009)                                   Montel, Virginie (I)
Up (2009)                                            [null]
Up in the Air (2009/I)                               Glicker, Danny
WALL·E (2008)                                        [null]
War for the Planet of the Apes (2017)                Bruning, Melissa
Waste Land (2010)                                    [null]
Whiplash (2014)                                      Norcia, Lisa
Wonder Woman (2017)                                  Hemming, Lindy
Wordplay (2006)                                      [null]
Zootopia (2016)                                      [null]

Where there is a LEFT JOIN construct you might guess that there must be a RIGHT JOIN construct. Right! We can get the same result as from our LEFT JOIN query if we switch the positions of the the table names and use RIGHT JOIN:

SELECT title, person_id
FROM credits
RIGHT JOIN movies ON movie_id = id AND type = 'costume_designer';

Note that the LEFT or RIGHT keyword simply indicates which of the two relations is taking the leading role in the outer join.

For more information on outer joins see the Wikipedia entry.

Practical Exercises (ticks) from 2016

You might want to practice SQL with the practical exercises from last year. Try to solve these yourself before peeking at the solutions!

Exercise 1.a

Complete the following query template to produce a query that returns rows name, total where the name column is the name of a person, and the total column indicates the number of movies of the genre Drama for which that person is a writer.

   SELECT name, count(*) AS total
      YOUR-CODE-GOES-HERE
   GROUP BY name
   ORDER BY total desc, name
   LIMIT 10;

Show solution notes

Exercise 1.b

Now modify your solution for Exercise 1.a so that names and totals are associated with those writers that write ONLY for movies associated ONLY with the genre Drama.

Show solution notes

Exercise 1.c

The position column of the credits table indicates an actor's rank on the billing of a movie. So for example, position = 1 represents a billing of a top star. A movie may have several top stars sharing the top billing. Your task is to complete the following query template so that it returns rows of the form name1, name2, title where the names are of two (different) actors sharing the top billing in the movie having the associated title. The first name should be lexicographically less than the second.

   SELECT P1.name AS name1, P2.name AS name2, title
      YOUR-CODE-GOES-HERE
   ORDER BY name1, name2, title ;

Show solution notes

Next Steps

Now you should know enough to attempt the practical questions, which appear on a separate page.