This tutorial will quickly get you started with our relational database. You will learn more if you run each of these queries yourself and experiment with your own variations. Once you have done this you will be ready to solve the practical exercises.
The database used in this course contains information about movies and actors, taken from the Internet Movie Database (IMDb). We have structured the data into a relational form, which is described in detail in the schema documentation.
We use SQL as query language. SQL is a vast language, and this tutorial teaches you a small subset through examples. However, we do try to cover the most commonly used constructs in the query language.
The database management system is HyperSQL Database (HSQLDB), which has the advantage of being quite easy to set up. The things you will learn in these exercises apply equally to more common databases you might have heard of, such as MySQL or PostgreSQL.
If you run into any problems, please post questions on the course's Moodle page.
You can use your own computer, or a college or university computer with Java installed. The examples should work equally well on Linux, Windows or Mac OS.
You need to download two files: one containing the database program and another containing the data. Download the database program code (relational-db.jar) and the database (relational-db.zip), and put them in the same directory. Unpack the zip file, which should give you a subdirectory called relational-db. (There should be only one directory called relational-db — not a directory inside a directory with the same name.) Now you are ready to explore the data.
In your terminal, change to the directory containing the downloaded JAR file, and run it as follows:
java -jar relational-db.jar relational-db
The last part of that command, relational-db, is the name of the subdirectory containing the data files. When you run it, a HSQL Database Manager window should pop up. On the left is a list of tables (called public.certificates and so on). At the top is a box where you can type a query, and at the bottom is an area where the results will appear.
Each movie in the database is identified by a unique number, its id. For example, the id of the latest Star Wars movie is 4148060. If you know that ID, you can look up information about the movie.
Type the following query in the top-right box, and click the Execute SQL button to run it:
It should produce a result looking like this:
ID TITLE YEAR ------- ------------------------------------------------- ---- 4148060 Star Wars: Episode VII - The Force Awakens (2015) 2015
The * here specifies that every column from the table will be printed. If we wanted only the title and year columns we could write:
This query produces the output:
TITLE YEAR ------------------------------------------------- ---- Star Wars: Episode VII - The Force Awakens (2015) 2015
SQL has many keywords. We have used automation to highlight keywords in green. Our column name year is also a keyword and the database system rather permissively allows this without complaint. Other systems may behave differently. Sigh. Most systems today are not case sensitive but for some (strange) reason it has become a tradition to write SQL's keywords using CAPITALS. Longer sigh.
If you don't already know the ID and are not even sure of the title, then you might search using a string match:
This query produces the output:
ID TITLE YEAR ------- ------------------------------------------------- ---- 3149936 Afghan Star (2009) 2009 4147697 Star Trek (2009) 2009 4148060 Star Wars: Episode VII - The Force Awakens (2015) 2015 4377019 Twenty Feet from Stardom (2013) 2013
The LIKE operator searches for strings that match some pattern, and the percent sign % means "any sub-string can appear here". Thus, title LIKE '%Star%' means searching for movies whose title contains the word Star as a sub-string.
If you don't want to search, you can list movies in alphabetical order, for example show the first five:
which looks like this:
ID TITLE YEAR ------- ------------------------------------ ---- 3099893 12 Years a Slave (2013) 2013 3108852 4 luni, 3 saptamâni si 2 zile (2007) 2007 3149936 Afghan Star (2009) 2009 3171436 American Hustle (2013) 2013 3175199 An Education (2009) 2009
If you leave off the LIMIT 5 and don't have any WHERE clause, you get back all the movies in the database. If you leave off the ORDER BY title, you get back the results in some arbitrary, undefined order.
We can count movies per year using the GROUP BY construct:
This query produces the output:
YEAR TOTAL ---- ----- 2004 1 2006 9 2007 10 2008 12 2009 10 2010 9 2011 10 2012 10 2013 11 2014 10 2015 8 2016 12 2017 8
The GROUP BY year construct will group together all records in the movies table that have the same value for the year attribute. The count(*) AS total specifies that the records in each group are to be counted, and that this count is associated with the new column named total. Here ORDER BY year will see that year is an integer value and then use the standard numeric order. If we wanted to order the list in the reverse order we could use ORDER BY year DESC as follows:
This query produces the output:
YEAR TOTAL ---- ----- 2017 8 2016 12 2015 8 2014 10 2013 11 2012 10 2011 10 2010 9 2009 10 2008 12 2007 10 2006 9 2004 1
Our collection of 120 movies in the database corresponds to the top ten movies per year, from 2006 through 2017, according to the web site Rotten Tomatoes. We can see from this query that IMDb and Rotten Tomatoes do not always agree on the year associated with a movie!
Now experiment with your own queries on the people table.
There are many tables in our database that provide additional information about each movie. Those tables are called certificates, keywords, languages, genres, locations, and release_dates. (For more information see schema documentation.) Each of these tables contains records with a movie_id together with some additional values.
Let's take a peek at the languages table:
This query produces the output:
MOVIE_ID LANGUAGE NOTE -------- -------- ------------------ 3099893 English [null] 3108852 Romanian [null] 3149936 Dari [null] 3149936 English [null] 3149936 Pashtu [null] 3171436 Arabic [null] 3171436 English [null] 3175199 English [null] 3175199 French (only a few lines) 3187610 English [null]
(Note: This SELECT is not constrained with an ORDER BY construct, so no particular order is guaranteed. You may obtain a different collection of 10 rows because of this non-determinism.)
Now suppose we want to write a query that returns new records of the form title, year, language — that is, we want to join together information from the movies table and the languages table (the title and year come from the movies table, but the language comes from the languages table). We can link the records from the two tables based on the movie ID: if the number in the movies.id column matches the number in languages.movie_id, then the two records are about the same movie. We can link these records using the JOIN construct:
SELECT title, year, language
FROM movies
JOIN languages ON movies.id = languages.movie_id
ORDER BY year, title
LIMIT 20 ;
This query produces the output (if your system produced a slightly different output can you explain why?):
TITLE YEAR LANGUAGE ----------------------------- ---- --------- Kekexili (2004) 2004 Tibetan Kekexili (2004) 2004 Mandarin Blindsight (2006/I) 2006 English Blindsight (2006/I) 2006 German Blindsight (2006/I) 2006 Tibetan Casino Royale (2006) 2006 English Casino Royale (2006) 2006 French Children of Men (2006) 2006 English Children of Men (2006) 2006 Arabic Children of Men (2006) 2006 German Children of Men (2006) 2006 Italian Children of Men (2006) 2006 Romanian Children of Men (2006) 2006 Russian Children of Men (2006) 2006 Serbian Children of Men (2006) 2006 Spanish Children of Men (2006) 2006 Georgian Deliver Us from Evil (2006) 2006 English El laberinto del fauno (2006) 2006 Spanish The Departed (2006) 2006 English The Departed (2006) 2006 Cantonese
There is an alternative way of writing such a query, which you will see in many older textbooks:
SELECT title, year, language
FROM movies, languages
WHERE movies.id = languages.movie_id
ORDER BY year, title
LIMIT 20 ;
This query produces the same output. However, here the JOIN is implicit in the way we have written the SELECT ... FROM ... WHERE ... expression. It is generally better practice to make joins explicit since it often makes queries easier to understand — the join logic is disentangled from the filter logic of the WHERE clause. Another reason is that (as we see below) there are several variations of the join construct, each having subtly different semantics, and when joins are explicit we can experiment with these variants.
Now that we see that some movies are multi-lingual, let's rank movies by the number of languages associated with each movie. We can combine the JOIN and GROUP BY constructions:
SELECT title, year, count(*) AS language_count
FROM movies
JOIN languages ON movies.id = languages.movie_id
GROUP BY title, year
ORDER BY language_count desc
LIMIT 20 ;
This query produces the output:
TITLE YEAR LANGUAGE_COUNT ------------------------------------------- ---- -------------- Children of Men (2006) 2006 9 District 9 (2009) 2009 6 Mission: Impossible - Ghost Protocol (2011) 2011 5 The Bourne Ultimatum (2007) 2007 5 Iron Man (2008) 2008 5 Snowpiercer (2013) 2013 5 Persepolis (2007) 2007 4 Drag Me to Hell (2009) 2009 4 The Avengers (2012) 2012 3 The Artist (2011/I) 2011 3 Slumdog Millionaire (2008) 2008 3 Un prophète (2009) 2009 3 Dunkirk (2017) 2017 3 Brooklyn (2015) 2015 3 The Queen (2006) 2006 3 Argo (2012) 2012 3 Blindsight (2006/I) 2006 3 Bikur Ha-Tizmoret (2007) 2007 3 Before Midnight (2013) 2013 3 Afghan Star (2009) 2009 3
The credits table is special in our database. It represents a many-to-many relationship between movies and people. Each record contains a movie_id and a person_id attribute. Thus each record represents a relationship between the person with that person_id and the movie with that movie_id. The additional attributes of the credits table tell us something about that relationship. Perhaps the most interesting attribute is type which tells us what kind of relationship is being recorded. Let's look at the different types:
This query produces the output:
TYPE TOTAL ---------------- ----- actor 8467 producer 1108 writer 289 editor 161 director 139 cinematographer 131 composer 125 costume_designer 90
In other words, the credits table represents eight distinct types of relationships between movies and people. As will be discussed in the lectures, this table is not in normal form. An alternative schema might split it into eight distinct tables, one for each type.
Let's construct our first three-way join! We will construct a table with attributes movie_id, title, person_id, name, type, character for the movie with title Gui tu lie che (2009):
SELECT movie_id, title, person_id, name, type, character
FROM movies
JOIN credits ON movies.id = credits.movie_id
JOIN people ON people.id = credits.person_id
WHERE title = 'Gui tu lie che (2009)'
ORDER BY type ;
This query produces the output:
MOVIE_ID TITLE PERSON_ID NAME TYPE CHARACTER -------- --------------------- --------- -------------------- --------------- --------- 3558459 Gui tu lie che (2009) 2635369 Zhang, Changhua actor Himself 3558459 Gui tu lie che (2009) 2331770 Tang, Tingsui actor Himself 3558459 Gui tu lie che (2009) 2636131 Zhang, Yang (VII) actor Himself 3558459 Gui tu lie che (2009) 4096718 Zhang, Qin (I) actor Herself 3558459 Gui tu lie che (2009) 2877417 Chen, Suqin actor Herself 3558459 Gui tu lie che (2009) 714378 Fan, Lixin (II) cinematographer [null] 3558459 Gui tu lie che (2009) 4107663 Alary, Olivier composer [null] 3558459 Gui tu lie che (2009) 714378 Fan, Lixin (II) director [null] 3558459 Gui tu lie che (2009) 714378 Fan, Lixin (II) editor [null] 3558459 Gui tu lie che (2009) 3898468 Stephen, Mary editor [null] 3558459 Gui tu lie che (2009) 403772 Chang, Yung (I) editor [null] 3558459 Gui tu lie che (2009) 4144700 Cross, Daniel (II) producer [null] 3558459 Gui tu lie che (2009) 4113686 Aung-Thwin, Mila (I) producer [null] 3558459 Gui tu lie che (2009) 4705097 Zhao, Qi (VI) producer [null] 3558459 Gui tu lie che (2009) 1632179 Moore, Bob (IV) producer [null]
Note that the query qualifies id as movies.id and people.id. This is required since both tables share a column with the name id, and SQL needs to know which one we are referring to! On the other hand, the movie_id and person_id columns are unambiguous, so you can omit the credits. prefix if you want.
Now let's construct a similar query to collect all information on the person Ben Affleck:
SELECT person_id, name, movie_id, title, type, character
FROM movies
JOIN credits ON movies.id = credits.movie_id
JOIN people ON people.id = credits.person_id
WHERE name = 'Affleck, Ben'
ORDER BY type ;
which produces the output:
PERSON_ID NAME MOVIE_ID TITLE TYPE CHARACTER --------- ------------ -------- --------------- -------- ----------- 18577 Affleck, Ben 3192514 Argo (2012) actor Tony Mendez 18577 Affleck, Ben 4321985 The Town (2010) actor Doug MacRay 18577 Affleck, Ben 3192514 Argo (2012) director [null] 18577 Affleck, Ben 4321985 The Town (2010) director [null] 18577 Affleck, Ben 3192514 Argo (2012) producer [null] 18577 Affleck, Ben 4321985 The Town (2010) writer [null]
A joke: A SQL query walks into a bar and sees two tables. She walks up to them and asks "Can I join you?"
Suppose we want to write a query that produces triples of the form genre1, genre2, total that count the number of movies associated with a pair of distinct genres. Since we need to join the genres table with itself, some means of differentiating the two instances of this table is required. The keyword AS comes to our rescue:
SELECT G1.genre AS genre1, G2.genre AS genre2, count(*) AS total
FROM genres AS G1
JOIN genres AS G2 ON G1.movie_id = G2.movie_id
WHERE G1.genre <> G2.genre
GROUP BY genre1, genre2
ORDER BY total desc
LIMIT 10;
This produces the output:
GENRE1 GENRE2 TOTAL --------- --------- ----- Thriller Drama 16 Drama Thriller 16 Drama Romance 14 Romance Drama 14 Biography Drama 13 Action Thriller 13 Thriller Action 13 Drama Biography 13 Sci-Fi Action 12 Adventure Action 12
Notice that there are two instances of every pair of genres. We can easily fix this by replacing the <> operator (which means “not equal”) with the < operator (“less than”), and thus ensure that the first genre is lexicographically less than the second:
SELECT G1.genre AS genre1, G2.genre AS genre2, count(*) AS total
FROM genres AS G1
JOIN genres AS G2 ON G1.movie_id = G2.movie_id
WHERE G1.genre < G2.genre
GROUP BY genre1, genre2
ORDER BY total desc
LIMIT 10;
This produces the output:
GENRE1 GENRE2 TOTAL --------- --------- ----- Drama Thriller 16 Drama Romance 14 Biography Drama 13 Action Thriller 13 Action Adventure 12 Action Sci-Fi 12 Adventure Family 12 Comedy Drama 11 Adventure Comedy 11 Animation Comedy 11
As we saw previously, movies can be associated with multiple languages. Suppose that we want to list the titles of movies that are not associated with the English language. One way to do this is to use a nested query (also called a sub-query) and the IN construct:
SELECT title
FROM movies
WHERE id NOT IN (
SELECT movie_id
FROM languages
WHERE language = 'English' );
This produces the output:
4 luni, 3 saptamâni si 2 zile (2007) Aruitemo aruitemo (2008) El laberinto del fauno (2006) Gui tu lie che (2009) Jiro Dreams of Sushi (2011) Jodaeiye Nader az Simin (2011) Kekexili (2004) Le Havre (2011) Leviafan (2014) Låt den rätte komma in (2008) Ma vie de Courgette (2016) Shaun the Sheep Movie (2015) Shi (2010) Un prophète (2009)
The expression id NOT IN (...) is the same as NOT (id IN (...)) and will be true only when the value associated with id is not in the result of evaluating the (...) sub-query.
From above we can see that Ben Affleck's person_id is 18577. Let's look at all rows of credits with this person_id:
which produces the output:
PERSON_ID MOVIE_ID TYPE NOTE CHARACTER POSITION LINE_ORDER GROUP_ORDER SUBGROUP_ORDER --------- -------- -------- ------------ ----------- -------- ---------- ----------- -------------- 18577 3192514 actor [null] Tony Mendez 1 18577 3192514 director [null] [null] 18577 3192514 producer (producer) [null] 18577 4321985 actor [null] Doug MacRay 1 18577 4321985 director [null] [null] 18577 4321985 writer (screenplay) [null] 1 2 1
We can see that for some rows there are no values displayed for some columns. For example, for those rows where type is director there is no integer value displayed in the position column. However, there actually is a value associated with these blank spots, and that is the NULL value (a user interface may or may not display this value). In a similar way, when type is not actor the character column contains a null value. Note that in this case our user-interface displays [null] instead of whitespace. Why? That's because the type of this column is string, and this display wants us to be able to distinguish between a NULL value and the empty string!
Let's try to select just those rows position is NULL as follows:
This query will return no rows at all! Why? Because with the NULL we enter the world of 3-valued logic. For two expressions e1 and e2 the expression e1 = e2 can evaluate to TRUE or FALSE or NULL (which can be read as we don't know the answer). Equality returns NULL when either e1 or e2 evaluates to NULL. (NULL is the Beyoncé of databases – nothing can be compared to it.) For example, position = NULL always returns NULL. In addition, TRUE AND NULL returns NULL as well. Furthermore, the SELECT ... WHERE ... construct only returns results for cases when the where-clause evaluates to TRUE.
SQL has a special construct to check for NULL values:
which does produce the expected output:
PERSON_ID MOVIE_ID TYPE NOTE CHARACTER POSITION LINE_ORDER GROUP_ORDER SUBGROUP_ORDER --------- -------- -------- ------------ --------- -------- ---------- ----------- -------------- 18577 3192514 director [null] [null] 18577 3192514 producer (producer) [null] 18577 4321985 director [null] [null] 18577 4321985 writer (screenplay) [null] 1 2 1
Suppose we have a tables T1 and T2 and we want to compute a new kind of JOIN that contains all of the rows of T1, even if there is no matching row in the T2 table. But what value should the query return for all T2 columns when a T1 row has no joining rows in the T2 table? The LEFT JOIN operation fills these columns with NULL values.
LEFT JOIN is also known as LEFT OUTER JOIN. Generally, outer joins may return results for rows that were not matched, whereas normal joins (also known as inner joins) only return matching records.
Let's look at an example.
One of the queries above tells us that there are 90 entries in the credits table for costume designers. Therefore we can confidently predict that the following query will return 90 rows:
Indeed, it does:
TITLE PERSON_ID --------------------------------------------------- --------- 12 Years a Slave (2013) 3606978 4 luni, 3 saptamâni si 2 zile (2007) 5037824 American Hustle (2013) 2558260 An Education (2009) 5029750 Argo (2012) 4044743 Arrival (2016/II) 5021377 Aruitemo aruitemo (2008) 3357344 Baby Driver (2017) 4560947 Before Midnight (2013) 3776755 Bikur Ha-Tizmoret (2007) 100356 Boyhood (2014/I) 5050258 Brooklyn (2015) 5029750 Casino Royale (2006) 3197500 Children of Men (2006) 3941839 Dallas Buyers Club (2013) 1298511 District 9 (2009) 2888744 Drag Me to Hell (2009) 3573667 Dunkirk (2017) 1298031 El laberinto del fauno (2006) 5037272 Frost/Nixon (2008) 1772486 Get Out (2017/I) 5035269 Gloria (2013/II) 385642 Gravity (2013) 3941839 Harry Potter and the Deathly Hallows: Part 2 (2011) 3941839 Hell or High Water (2016/II) 5059702 Her (2013) 2284973 In the Loop (2009) 5042904 Inside Llewyn Davis (2013) 4100410 Iron Man (2008) 5034610 Iron Man (2008) 3838406 It Follows (2014) 5042261 Juno (2007) 5051445 La La Land (2016/I) 4100410 Le Havre (2011) 348581 Leviafan (2014) 5022641 Logan (2017) 1772486 Looper (2012) 2950086 Love & Friendship (2016) 3522990 Låt den rätte komma in (2008) 3908407 Ma vie de Courgette (2016) 5034463 Ma vie de Courgette (2016) 5052712 Mad Max: Fury Road (2015) 2746864 Manchester by the Sea (2016) 3964523 Milk (2008/I) 862446 Mission: Impossible - Ghost Protocol (2011) 1190879 Moneyball (2011) 4024522 Moonlight (2016/I) 5031222 Moonrise Kingdom (2012) 4024522 Mr. Turner (2014) 3008293 Mud (2012) 5050258 Nightcrawler (2014) 5061837 No Country for Old Men (2007) 4100410 Once (2007) 5027880 Room (2015/I) 5025909 Selma (2014) 2856180 Shi (2010) 5042067 Short Term 12 (2013) 2923520 Short Term 12 (2013) 5034288 Silver Linings Playbook (2012) 286624 Skyfall (2012) 705532 Skyfall (2012) 3941839 Slumdog Millionaire (2008) 5041701 Snowpiercer (2013) 5033561 Spider-Man: Homecoming (2017) 5032765 Spotlight (2015/I) 2886785 Star Trek (2009) 1190879 Star Wars: Episode VII - The Force Awakens (2015) 1190879 The Artist (2011/I) 286624 The Avengers (2012) 2832963 The Babadook (2014) 4026481 The Big Sick (2017) 5025291 The Bourne Ultimatum (2007) 2930437 The Cabin in the Woods (2012) 3971106 The Dark Knight (2008) 3197500 The Departed (2006) 3696229 The Hurt Locker (2008) 1394582 The Jungle Book (2016) 3838406 The King's Speech (2010) 2746864 The Martian (2015) 4081927 The Muppets (2011) 5020346 The Queen (2006) 2799107 The Social Network (2010) 4044743 The Town (2010) 5044971 The Wrestler (2008) 5061837 True Grit (2010) 4100410 Un prophète (2009) 3548257 Up in the Air (2009/I) 862446 War for the Planet of the Apes (2017) 2819305 Whiplash (2014) 5048444 Wonder Woman (2017) 3197500
Since we have 120 movies in our database we know that some movies have no costume designers recorded in our database. Let's rewrite this query using LEFT JOIN in order to see all movies, even those without costume designers:
SELECT title, person_id
FROM movies
LEFT JOIN credits ON movie_id = id AND type = 'costume_designer';
The result is:
TITLE PERSON_ID --------------------------------------------------- --------- 12 Years a Slave (2013) 3606978 4 luni, 3 saptamâni si 2 zile (2007) 5037824 Afghan Star (2009) American Hustle (2013) 2558260 An Education (2009) 5029750 Anvil: The Story of Anvil (2008) Argo (2012) 4044743 Arrival (2016/II) 5021377 Aruitemo aruitemo (2008) 3357344 Baby Driver (2017) 4560947 Before Midnight (2013) 3776755 Bikur Ha-Tizmoret (2007) 100356 Blindsight (2006/I) Boyhood (2014/I) 5050258 Brooklyn (2015) 5029750 Casino Royale (2006) 3197500 Children of Men (2006) 3941839 Dallas Buyers Club (2013) 1298511 Deliver Us from Evil (2006) District 9 (2009) 2888744 Drag Me to Hell (2009) 3573667 Dunkirk (2017) 1298031 El laberinto del fauno (2006) 5037272 Finding Dory (2016) Frost/Nixon (2008) 1772486 Get Out (2017/I) 5035269 Gloria (2013/II) 385642 Gravity (2013) 3941839 Gui tu lie che (2009) Harry Potter and the Deathly Hallows: Part 2 (2011) 3941839 Hell or High Water (2016/II) 5059702 Her (2013) 2284973 How to Survive a Plague (2012) How to Train Your Dragon (2010) I Am Not Your Negro (2016) In the Loop (2009) 5042904 Inside Job (2010) Inside Llewyn Davis (2013) 4100410 Inside Out (2015/I) Iron Man (2008) 5034610 Iron Man (2008) 3838406 It Follows (2014) 5042261 Jiro Dreams of Sushi (2011) Jodaeiye Nader az Simin (2011) Juno (2007) 5051445 Kekexili (2004) La La Land (2016/I) 4100410 Le Havre (2011) 348581 Leviafan (2014) 5022641 Life Itself (2014) Logan (2017) 1772486 Looper (2012) 2950086 Love & Friendship (2016) 3522990 Låt den rätte komma in (2008) 3908407 Ma vie de Courgette (2016) 5034463 Ma vie de Courgette (2016) 5052712 Mad Max: Fury Road (2015) 2746864 Man on Wire (2008) Manchester by the Sea (2016) 3964523 Milk (2008/I) 862446 Mission: Impossible - Ghost Protocol (2011) 1190879 Moana (2016/I) Moneyball (2011) 4024522 Moonlight (2016/I) 5031222 Moonrise Kingdom (2012) 4024522 Mr. Turner (2014) 3008293 Mud (2012) 5050258 Nightcrawler (2014) 5061837 No Country for Old Men (2007) 4100410 Once (2007) 5027880 Persepolis (2007) Project Nim (2011) Ratatouille (2007) Room (2015/I) 5025909 Selma (2014) 2856180 Shaun the Sheep Movie (2015) Shi (2010) 5042067 Short Term 12 (2013) 2923520 Short Term 12 (2013) 5034288 Silver Linings Playbook (2012) 286624 Skyfall (2012) 705532 Skyfall (2012) 3941839 Slumdog Millionaire (2008) 5041701 Snowpiercer (2013) 5033561 Spider-Man: Homecoming (2017) 5032765 Spotlight (2015/I) 2886785 Star Trek (2009) 1190879 Star Wars: Episode VII - The Force Awakens (2015) 1190879 Taxi to the Dark Side (2007) The Artist (2011/I) 286624 The Avengers (2012) 2832963 The Babadook (2014) 4026481 The Big Sick (2017) 5025291 The Bourne Ultimatum (2007) 2930437 The Cabin in the Woods (2012) 3971106 The Dark Knight (2008) 3197500 The Departed (2006) 3696229 The Hurt Locker (2008) 1394582 The Interrupters (2011) The Invisible War (2012) The Jungle Book (2016) 3838406 The King of Kong (2007) The King's Speech (2010) 2746864 The LEGO Movie (2014) The Martian (2015) 4081927 The Muppets (2011) 5020346 The Queen (2006) 2799107 The Social Network (2010) 4044743 The Town (2010) 5044971 The War Tapes (2006) The Wrestler (2008) 5061837 Toy Story 3 (2010) True Grit (2010) 4100410 Twenty Feet from Stardom (2013) Un prophète (2009) 3548257 Up (2009) Up in the Air (2009/I) 862446 WALL·E (2008) War for the Planet of the Apes (2017) 2819305 Waste Land (2010) Whiplash (2014) 5048444 Wonder Woman (2017) 3197500 Wordplay (2006) Zootopia (2016)
This query will return 124 rows (because some movies have more than one costume designer). For the 34 movies that do not have a costume designer (90 + 34 = 124), the person_id column contains a NULL value.
We might expect that the following query will produce the same result:
SELECT title, person_id
FROM movies
LEFT JOIN credits ON movie_id = id
WHERE type = 'costume_designer';
However, this is not the case! This query will produce only 90 rows! In fact, it is semantically identical to the first example in this section. Why is this so? The where-clause is evaluated after the left join has been performed but before the SELECT has pulled out the relevant columns. At that point the type columns of non-matching rows contain a NULL value, and so do not produce TRUE for the WHERE type = 'costume_designer' clause. (Very audible sigh.)
Now suppose we want to replace every person_id with the name of the associated costume designer. Here is one way to rewrite the query:
SELECT title, name
FROM movies
LEFT JOIN credits ON movie_id = movies.id AND type = 'costume_designer'
LEFT JOIN people ON person_id = people.id;
Note that this query using another LEFT JOIN, which is evaluated after the first join but before the SELECT removes some columns. Here is the result:
TITLE NAME --------------------------------------------------- ------------------------ 12 Years a Slave (2013) Norris, Patricia (I) 4 luni, 3 saptamâni si 2 zile (2007) Istrate, Dana Afghan Star (2009) [null] American Hustle (2013) Wilkinson, Michael (I) An Education (2009) Dicks-Mireaux, Odile Anvil: The Story of Anvil (2008) [null] Argo (2012) West, Jacqueline Arrival (2016/II) April, Renée Aruitemo aruitemo (2008) Kurosawa, Kazuko (I) Baby Driver (2017) Hoffman, Courtney (I) Before Midnight (2013) Rozana, Vasileia Bikur Ha-Tizmoret (2007) Ashkenazi, Doron Blindsight (2006/I) [null] Boyhood (2014/I) Perkins, Kari Brooklyn (2015) Dicks-Mireaux, Odile Casino Royale (2006) Hemming, Lindy Children of Men (2006) Temime, Jany Dallas Buyers Club (2013) Kurt and Bart Deliver Us from Evil (2006) [null] District 9 (2009) Cilliers, Diana Drag Me to Hell (2009) Mussenden, Isis Dunkirk (2017) Kurland, Jeffrey El laberinto del fauno (2006) Huete, Lala Finding Dory (2016) [null] Frost/Nixon (2008) Orlandi, Daniel Get Out (2017/I) Haders, Nadine Gloria (2013/II) Castro, Eduardo (I) Gravity (2013) Temime, Jany Gui tu lie che (2009) [null] Harry Potter and the Deathly Hallows: Part 2 (2011) Temime, Jany Hell or High Water (2016/II) Turzanska, Malgosia Her (2013) Storm, Casey How to Survive a Plague (2012) [null] How to Train Your Dragon (2010) [null] I Am Not Your Negro (2016) [null] In the Loop (2009) Little, Ros Inside Job (2010) [null] Inside Llewyn Davis (2013) Zophres, Mary Inside Out (2015/I) [null] Iron Man (2008) Gregg, Rebecca Iron Man (2008) Shannon, Laura Jean It Follows (2014) Leitz, Kimberly Jiro Dreams of Sushi (2011) [null] Jodaeiye Nader az Simin (2011) [null] Juno (2007) Prudhomme, Monique Kekexili (2004) [null] La La Land (2016/I) Zophres, Mary Le Havre (2011) Cambier, Frédéric Leviafan (2014) Bartuli, Anna Life Itself (2014) [null] Logan (2017) Orlandi, Daniel Looper (2012) Davis, Sharen Love & Friendship (2016) Mhaoldomhnaigh, Eimer Ni Låt den rätte komma in (2008) Strid, Maria Ma vie de Courgette (2016) Grandchamp, Christel Ma vie de Courgette (2016) Riera, Vanessa Mad Max: Fury Road (2015) Beavan, Jenny (I) Man on Wire (2008) [null] Manchester by the Sea (2016) Toth, Melissa Milk (2008/I) Glicker, Danny Mission: Impossible - Ghost Protocol (2011) Kaplan, Michael (I) Moana (2016/I) [null] Moneyball (2011) Walicka-Maimone, Kasia Moonlight (2016/I) Eselin, Caroline Moonrise Kingdom (2012) Walicka-Maimone, Kasia Mr. Turner (2014) Durran, Jacqueline Mud (2012) Perkins, Kari Nightcrawler (2014) Westcott, Amy (I) No Country for Old Men (2007) Zophres, Mary Once (2007) Corvisieri, Tiziana Persepolis (2007) [null] Project Nim (2011) [null] Ratatouille (2007) [null] Room (2015/I) Carlson, Lea Selma (2014) Carter, Ruth E. Shaun the Sheep Movie (2015) [null] Shi (2010) Lee, Choong-yeon Short Term 12 (2013) Cretton, Joy Short Term 12 (2013) Gordon-Crozier, Mirren Silver Linings Playbook (2012) Bridges, Mark (I) Skyfall (2012) Everest, Timothy Skyfall (2012) Temime, Jany Slumdog Millionaire (2008) Larlarb, Suttirat Anne Snowpiercer (2013) George, Catherine (II) Spider-Man: Homecoming (2017) Frogley, Louise Spotlight (2015/I) Chuck, Wendy Star Trek (2009) Kaplan, Michael (I) Star Wars: Episode VII - The Force Awakens (2015) Kaplan, Michael (I) Taxi to the Dark Side (2007) [null] The Artist (2011/I) Bridges, Mark (I) The Avengers (2012) Byrne, Alexandra (I) The Babadook (2014) Wallace, Heather (I) The Big Sick (2017) Burton, Sarah Mae The Bourne Ultimatum (2007) Cunliffe, Shay The Cabin in the Woods (2012) Trpcic, Shawna The Dark Knight (2008) Hemming, Lindy The Departed (2006) Powell, Sandy (II) The Hurt Locker (2008) Little, George L. (I) The Interrupters (2011) [null] The Invisible War (2012) [null] The Jungle Book (2016) Shannon, Laura Jean The King of Kong (2007) [null] The King's Speech (2010) Beavan, Jenny (I) The LEGO Movie (2014) [null] The Martian (2015) Yates, Janty The Muppets (2011) Afiley, Rahel The Queen (2006) Boyle, Consolata The Social Network (2010) West, Jacqueline The Town (2010) Matheson, Susan The War Tapes (2006) [null] The Wrestler (2008) Westcott, Amy (I) Toy Story 3 (2010) [null] True Grit (2010) Zophres, Mary Twenty Feet from Stardom (2013) [null] Un prophète (2009) Montel, Virginie (I) Up (2009) [null] Up in the Air (2009/I) Glicker, Danny WALL·E (2008) [null] War for the Planet of the Apes (2017) Bruning, Melissa Waste Land (2010) [null] Whiplash (2014) Norcia, Lisa Wonder Woman (2017) Hemming, Lindy Wordplay (2006) [null] Zootopia (2016) [null]
Where there is a LEFT JOIN construct you might guess that there must be a RIGHT JOIN construct. Right! We can get the same result as from our LEFT JOIN query if we switch the positions of the the table names and use RIGHT JOIN:
SELECT title, person_id
FROM credits
RIGHT JOIN movies ON movie_id = id AND type = 'costume_designer';
Note that the LEFT or RIGHT keyword simply indicates which of the two relations is taking the leading role in the outer join.
For more information on outer joins see the Wikipedia entry.
You might want to practice SQL with the practical exercises from last year. Try to solve these yourself before peeking at the solutions!
Complete the following query template to produce a query that returns rows name, total where the name column is the name of a person, and the total column indicates the number of movies of the genre Drama for which that person is a writer.
Now modify your solution for Exercise 1.a so that names and totals are associated with those writers that write ONLY for movies associated ONLY with the genre Drama.
The position column of the credits table indicates an actor's rank on the billing of a movie. So for example, position = 1 represents a billing of a top star. A movie may have several top stars sharing the top billing. Your task is to complete the following query template so that it returns rows of the form name1, name2, title where the names are of two (different) actors sharing the top billing in the movie having the associated title. The first name should be lexicographically less than the second.
Now you should know enough to attempt the practical questions, which appear on a separate page.