We have provided a sample database with information about movies and actors, taken from the Internet Movie Database (IMDb). We have structured the data in a relational schema, and this page describes the form and meaning of those structures.
The sample database contains 11 tables. The two main tables are movies, which lists every movie in the database, and people, which lists all the people involved in one or more movies. The credits table indicates who was involved with which movie (as actor, director, producer, etc). The remaining 8 tables contain various additional information about movies, such as the locations where it was shot, and the date it was released.
There are two different versions of the sample database: a large version and a small version. The table structure of those databases is the same; the biggest difference is that the small database contains only 100 movies, whereas the large one contains almost a million movies. As the large database covers a much broader variety of movies, it also contains more strange quirks and oddities!
Here are the SQL schema definitions of those tables:
CREATE TABLE movies( id integer PRIMARY KEY, title varchar(255) NOT NULL UNIQUE, year integer) CREATE TABLE people( id integer PRIMARY KEY, name varchar(255) NOT NULL UNIQUE, gender varchar(10)) CREATE TABLE credits( person_id integer NOT NULL REFERENCES people (id), movie_id integer NOT NULL REFERENCES movies (id), type varchar(20) NOT NULL, note varchar(255), character varchar(255), position integer, line_order integer, group_order integer, subgroup_order integer, UNIQUE (person_id, movie_id, type)) CREATE TABLE certificates( movie_id integer NOT NULL REFERENCES movies (id), country varchar(20) NOT NULL, certificate varchar(20) NOT NULL, note varchar(255)) CREATE TABLE genres( movie_id integer NOT NULL REFERENCES movies (id), genre varchar(25) NOT NULL) CREATE TABLE keywords( movie_id integer NOT NULL REFERENCES movies (id), keyword varchar(127) NOT NULL) CREATE TABLE languages( movie_id integer NOT NULL REFERENCES movies (id), language varchar(35) NOT NULL, note varchar(255)) CREATE TABLE locations( movie_id integer NOT NULL REFERENCES movies (id), location varchar(255) NOT NULL, note varchar(511)) CREATE TABLE release_dates( movie_id integer NOT NULL REFERENCES movies (id), country varchar(40) NOT NULL, release_date varchar(10) NOT NULL, note varchar(255))
Here is some more detail on how the data in each table should be interpreted, with links to the IMDb policies on how data should be structured.
Indicates which people were involved with which movie (for example as actor, director, or producer). The type column indicates the type of involvement:
actor: | The person was a member of the cast of this movie. The character column gives the name of the character they played, and the position column indicates the order in which the actors appear in the credits. For example, the actor who is listed first in the credits has a position of 1, the actor who is listed second has 2, and so on. Uncredited actors have a null position. |
---|---|
cinematographer: | |
The person was a cinematographer or director of photography on this movie. | |
composer: | The person composed the main background score of this movie. |
costume_designer: | |
The person is credited as costume designer on this movie. | |
director: | The person directed this movie. |
editor: | The person is credited as picture editor of this movie. |
producer: | The person is credited as producer, executive producer, line producer or similar of this movie. |
production_designer: | |
The person is credited as production designer on this movie. | |
writer: | The person was screenplay or story writer of this movie. Writers have numbers in the line_order, group_order and subgroup_order columns, indicating the collaborations and level of contribution according to quite specific and complex rules. |