Graph databases: Getting started

This tutorial will help you get started with the graph-oriented database Neo4j. In particular, you will learn constructs of the Cypher query language that you will need for solving the practical exercises. (In case you are wondering, the name Cypher has nothing to do with ciphers in cryptography. The language is named after a character in the movie The Matrix.)

Setting up Neo4j

As in the previous practical on relational databases, we have prepared a dataset of movies and people (actors, directors, etc.) based on IMDb.

  1. Download the database (graph-movie-db.zip), and unpack the zip file, which should give you a subdirectory called graph-movie-db.

  2. Download the free Neo4j Community Edition from the Neo4j download website. They provide installers for Windows, Mac and Linux.

    When you run this application it will allow you to select a directory containing Neo4j data files. You can choose the graph-movie-db directory from unpacking the zip file in order to access the IMDb example database. You can also leave it as the default, which will give you an empty database in which you can run through a tutorial.

    To access the graphical interface, first start up the Neo4j database, and then point your web browser at localhost:7474. We suggest that you first do this with the default empty database, and follow some of the built-in tutorials available in the browser.

    Unduer Linux, if you have unpacked the Neo4j tarball in the directory NEOHOME, then you may want to move the directory graph-movie-db to NEOHOME/data/databases/graph-movie-db and then edit the configuration file NEOHOME/conf/neo4j.conf by changing the line #dbms.active_database=graph.db to dbms.active_database=graph-movie-db (make sure you remove the comment symbol #). This will make graph-movie-db your default Neo4j database

Exploring the movie database

A Neo4j database contains nodes and directed binary relationships between nodes. Nodes can have multiple labels that act to classify nodes into different, perhaps overlapping, classes. Each node or relationship is associated with a set of properties.

Rather than giving schema documentation, as we did for the relational database, we will start by using Cypher to explore the contents of the database.

Here is a Cypher query the explores what kinds of node exist in our database:

match (n)
return labels(n) as labels, keys(n) as keys, count(*) as total
order by total desc;

The keyword match is followed by the pattern (n) that matches any node and assigns it to the variable n. The function call labels(n) returns a list of all labels associated with the node n. In our case, each node has exactly one label, but in general nodes can have any number of labels. The function call keys(n) returns a list of all of the names associated with properties — these act much like column names in SQL. (The term key here simply means the name of a node property, not a key that uniquely identifies the node.) The rest of the query looks very much like SQL, except for the fact that a group by construct is implicit in Cypher: count(*) is an aggregate function, so the query implicitly groups by labels and keys. The query returns:

+------------------------------------------------------+
| labels       | keys                          | total |
+------------------------------------------------------+
| ["Keyword"]  | ["keyword_id","keyword"]      | 9579  |
| ["Person"]   | ["person_id","name","gender"] | 9283  |
| ["Location"] | ["location_id","location"]    | 893   |
| ["Country"]  | ["country_id","country"]      | 122   |
| ["Movie"]    | ["movie_id","title","year"]   | 120   |
| ["Language"] | ["language_id","language"]    | 39    |
| ["Genre"]    | ["genre_id","genre"]          | 22    |
+------------------------------------------------------+

You can see that there are 100 movies and 7,711 people in the database — the same data as in the relational case. But the list of node labels is not quite the same as the list of tables in the relational model. Let's explore further.

In a similar way, we can extract information about the relationships in the database:

match (m)-[r]->(n)
return labels(m), type(r), labels(n), count(*) as total
order by total desc;

The pattern (m)-[r]->(n) matches two nodes (m and n) and a relationship (r) from m to n (all relationships in Neo4j have a direction). You can read it like an arrow m --> n where the relationship r is enclosed in square brackets. The function type(r) returns the type of the relationship r. The query returns:

+------------------------------------------------------------+
| labels(m)  | type(r)                | labels(n)    | total |
+------------------------------------------------------------+
| ["Movie"]  | "HAS_KEYWORD"          | ["Keyword"]  | 20131 |
| ["Person"] | "ACTS_IN"              | ["Movie"]    | 8467  |
| ["Movie"]  | "RELEASED_IN"          | ["Country"]  | 7129  |
| ["Movie"]  | "CERTIFIED_IN"         | ["Country"]  | 2752  |
| ["Person"] | "PRODUCED"             | ["Movie"]    | 1108  |
| ["Movie"]  | "HAS_LOCATION"         | ["Location"] | 1088  |
| ["Movie"]  | "HAS_GENRE"            | ["Genre"]    | 340   |
| ["Person"] | "WROTE"                | ["Movie"]    | 289   |
| ["Movie"]  | "HAS_LANGUAGE"         | ["Language"] | 214   |
| ["Person"] | "EDITED"               | ["Movie"]    | 161   |
| ["Person"] | "DIRECTED"             | ["Movie"]    | 139   |
| ["Person"] | "CINEMATOGRAPHER_FOR"  | ["Movie"]    | 131   |
| ["Person"] | "COMPOSER_FOR"         | ["Movie"]    | 125   |
| ["Person"] | "COSTUME_DESIGNER_FOR" | ["Movie"]    | 90    |
+------------------------------------------------------------+

This tells us, for example, that there are 6,913 relationships of type ACTS_IN between nodes with label Person and nodes of label Movie.

Each relationship between two nodes can also have a set of properties. The following query uses the pattern () that matches any node:

match ()-[r]->()
return type(r) as type, keys(r) as keys, count(*) as total
order by type;

The query returns:

+---------------------------------------------------------------------------------------+
| type                   | keys                                                 | total |
+---------------------------------------------------------------------------------------+
| "ACTS_IN"              | ["position","note"]                                  | 10    |
| "ACTS_IN"              | ["position"]                                         | 61    |
| "ACTS_IN"              | ["role","position","note"]                           | 4032  |
| "ACTS_IN"              | ["role","position"]                                  | 4364  |
| "CERTIFIED_IN"         | ["certificate","note"]                               | 538   |
| "CERTIFIED_IN"         | ["certificate"]                                      | 2214  |
| "CINEMATOGRAPHER_FOR"  | ["note"]                                             | 86    |
| "CINEMATOGRAPHER_FOR"  | []                                                   | 45    |
| "COMPOSER_FOR"         | ["note"]                                             | 30    |
| "COMPOSER_FOR"         | []                                                   | 95    |
| "COSTUME_DESIGNER_FOR" | ["note"]                                             | 10    |
| "COSTUME_DESIGNER_FOR" | []                                                   | 80    |
| "DIRECTED"             | ["note"]                                             | 25    |
| "DIRECTED"             | []                                                   | 114   |
| "EDITED"               | ["note"]                                             | 38    |
| "EDITED"               | []                                                   | 123   |
| "HAS_GENRE"            | []                                                   | 340   |
| "HAS_KEYWORD"          | []                                                   | 20131 |
| "HAS_LANGUAGE"         | []                                                   | 184   |
| "HAS_LANGUAGE"         | ["note"]                                             | 30    |
| "HAS_LOCATION"         | []                                                   | 577   |
| "HAS_LOCATION"         | ["note"]                                             | 511   |
| "PRODUCED"             | ["note"]                                             | 1108  |
| "RELEASED_IN"          | ["date","note"]                                      | 2194  |
| "RELEASED_IN"          | ["date"]                                             | 4935  |
| "WROTE"                | ["note","line_order","group_order","subgroup_order"] | 279   |
| "WROTE"                | ["line_order","group_order","subgroup_order"]        | 10    |
+---------------------------------------------------------------------------------------+

Consider the lists of properties of ACTS_IN relationships. We have modeled the ACTS_IN relationship as having properties role, position, and note. However, note that not all instances of this relationship type have all of these properties. This is how Neo4j deals with the absence of data: whereas SQL has a special NULL value that indicates there is no value, Neo4j simply leaves out a property that has no associated value, so it doesn't appear in keys().

SQL-like queries

Cypher has been designed so that many simple SQL queries can be translated easily into Cypher. Here we give translation examples for some of the queries from our SQL tutorial.

The SQL query:

SELECT title, year
FROM movies
WHERE id = 4148060;

can be written in Cypher as:

match (m:Movie)
where m.movie_id = '4148060'
return m.title as title, m.year as year;

The expression (m:Movie) matches a node that has the label Movie, and assigns it to a variable called m. You can then refer to m in other parts of the query, such as the where and return clauses. The query returns:

+------------------------------------------------------------+
| title                                               | year |
+------------------------------------------------------------+
| "Star Wars: Episode VII - The Force Awakens (2015)" | 2015 |
+------------------------------------------------------------+

SQL aggregation, such as

SELECT year, count(*) AS total
FROM movies
GROUP BY year
ORDER BY year DESC;

can be expressed directly in Cypher:

match (m:Movie)
return m.year as year, count(*) as total
order by year desc;

Note that in Cypher the GROUP BY construct is implicit when you use an aggregate function such as count(*). The query returns:

+--------------+
| year | total |
+--------------+
| 2017 | 8     |
| 2016 | 12    |
| 2015 | 8     |
| 2014 | 10    |
| 2013 | 11    |
| 2012 | 10    |
| 2011 | 10    |
| 2010 | 9     |
| 2009 | 10    |
| 2008 | 12    |
| 2007 | 10    |
| 2006 | 9     |
| 2004 | 1     |
+--------------+

Joins are easier to write in Cypher because we can use patterns. For example, the SQL query

SELECT title, year, count(*) AS language_count
FROM movies
JOIN languages ON movies.id = languages.movie_id
GROUP BY title, year
ORDER BY language_count desc
LIMIT 20 ;

can be written in Cypher as:

match (m:Movie)-[:HAS_LANGUAGE]->(l:Language)
return m.title as title, m.year as year, count(*) as language_count
order by language_count desc
limit 20;

The pattern (m:Movie)-[:HAS_LANGUAGE]->(l:Language) matches only relationships of type HAS_LANGUAGE. The variables m and l are bound such that l is a node of label Language, and l is a language that is spoken in movie m. The query returns:

+-----------------------------------------------------------------------+
| title                                         | year | language_count |
+-----------------------------------------------------------------------+
| "Children of Men (2006)"                      | 2006 | 9              |
| "District 9 (2009)"                           | 2009 | 6              |
| "Snowpiercer (2013)"                          | 2013 | 5              |
| "The Bourne Ultimatum (2007)"                 | 2007 | 5              |
| "Mission: Impossible - Ghost Protocol (2011)" | 2011 | 5              |
| "Iron Man (2008)"                             | 2008 | 5              |
| "Persepolis (2007)"                           | 2007 | 4              |
| "Drag Me to Hell (2009)"                      | 2009 | 4              |
| "Bikur Ha-Tizmoret (2007)"                    | 2007 | 3              |
| "Arrival (2016/II)"                           | 2016 | 3              |
| "Argo (2012)"                                 | 2012 | 3              |
| "Brooklyn (2015)"                             | 2015 | 3              |
| "The Artist (2011/I)"                         | 2011 | 3              |
| "Before Midnight (2013)"                      | 2013 | 3              |
| "The Queen (2006)"                            | 2006 | 3              |
| "The Avengers (2012)"                         | 2012 | 3              |
| "Afghan Star (2009)"                          | 2009 | 3              |
| "Dunkirk (2017)"                              | 2017 | 3              |
| "Un prophète (2009)"                          | 2009 | 3              |
| "Blindsight (2006/I)"                         | 2006 | 3              |
+-----------------------------------------------------------------------+

The SQL query:

SELECT movie_id, title, person_id, name, type, character
FROM movies
JOIN credits ON movies.id = credits.movie_id
JOIN people  ON people.id = credits.person_id
WHERE title = 'Gui tu lie che (2009)'
ORDER BY type ;

can be written in Cypher as:

match (m:Movie)<-[r]-(p:Person)
where m.title = 'Gui tu lie che (2009)'
return m.movie_id as movie_id,
       m.title as title,
       p.person_id as person_id,
       p.name as name,
       type(r) as type,
       r.role as character
order by type;

You can see from the exploration above that a node with label Person can have the properties person_id and name, and that a relationship of type ACTS_IN can have the property role. The query returns:

+--------------------------------------------------------------------------------------------------------------+
| movie_id  | title                   | person_id | name                   | type                  | character |
+--------------------------------------------------------------------------------------------------------------+
| "3558459" | "Gui tu lie che (2009)" | "2331770" | "Tang, Tingsui"        | "ACTS_IN"             | "Himself" |
| "3558459" | "Gui tu lie che (2009)" | "2636131" | "Zhang, Yang (VII)"    | "ACTS_IN"             | "Himself" |
| "3558459" | "Gui tu lie che (2009)" | "4096718" | "Zhang, Qin (I)"       | "ACTS_IN"             | "Herself" |
| "3558459" | "Gui tu lie che (2009)" | "2877417" | "Chen, Suqin"          | "ACTS_IN"             | "Herself" |
| "3558459" | "Gui tu lie che (2009)" | "2635369" | "Zhang, Changhua"      | "ACTS_IN"             | "Himself" |
| "3558459" | "Gui tu lie che (2009)" | "714378"  | "Fan, Lixin (II)"      | "CINEMATOGRAPHER_FOR" | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "4107663" | "Alary, Olivier"       | "COMPOSER_FOR"        | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "714378"  | "Fan, Lixin (II)"      | "DIRECTED"            | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "3898468" | "Stephen, Mary"        | "EDITED"              | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "714378"  | "Fan, Lixin (II)"      | "EDITED"              | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "403772"  | "Chang, Yung (I)"      | "EDITED"              | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "4705097" | "Zhao, Qi (VI)"        | "PRODUCED"            | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "1632179" | "Moore, Bob (IV)"      | "PRODUCED"            | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "4144700" | "Cross, Daniel (II)"   | "PRODUCED"            | <null>    |
| "3558459" | "Gui tu lie che (2009)" | "4113686" | "Aung-Thwin, Mila (I)" | "PRODUCED"            | <null>    |
+--------------------------------------------------------------------------------------------------------------+

Multiple patterns

Recall our SQL query for counting the number of movies associated with a pair of distinct genres:

SELECT G1.genre AS genre1, G2.genre AS genre2, count(*) AS total
FROM genres AS G1
JOIN genres AS G2 ON G1.movie_id = G2.movie_id
WHERE G1.genre < G2.genre
GROUP BY genre1, genre2
ORDER BY total desc
LIMIT 10;

This can be written in Cypher as:

match
  (m:Movie)-[:HAS_GENRE]->(g1:Genre),
  (m:Movie)-[:HAS_GENRE]->(g2:Genre)
where g1.genre < g2.genre
return g1.genre as genre1, g2.genre as genre2, count(*) as total
order by total desc
limit 10;

Here, there are two matching patterns: (m)-->(g1) and (m)-->(g2). The query searches for nodes m, g1, g2 that satisfy both patterns. Since the same variable m appears in both patterns, the query requires that the two genres g1 and g2 are associated with the same movie (this is equivalent to the G1.movie_id = G2.movie_id condition in the SQL query).

The same query can be written more concisely in Cypher:

match (g1:Genre)<-[:HAS_GENRE]-(:Movie)-[:HAS_GENRE]->(g2:Genre)
where g1.genre < g2.genre
return g1.genre as genre1, g2.genre as genre2, count(*) as total
order by total desc
limit 10;

The two matching patterns (m)-->(g1) and (m)-->(g2) have been combined into a single pattern (g1)<--(m)-->(g2). We left out the variable name m because we no longer need it, so we can refer to the middle node as (:Movie), which means "any node labelled Movie".

The query returns:

+-----------------------------------+
| genre1      | genre2      | total |
+-----------------------------------+
| "Drama"     | "Thriller"  | 16    |
| "Drama"     | "Romance"   | 14    |
| "Biography" | "Drama"     | 13    |
| "Action"    | "Thriller"  | 13    |
| "Action"    | "Adventure" | 12    |
| "Action"    | "Sci-Fi"    | 12    |
| "Adventure" | "Family"    | 12    |
| "Animation" | "Comedy"    | 11    |
| "Animation" | "Family"    | 11    |
| "Adventure" | "Comedy"    | 11    |
+-----------------------------------+

The with construct

Suppose we want to modify this last Cypher query so that it only returns those rows where total is greater than 9. The problem is that total is defined in the return clause and we have no place to put a where total > 9 clause. The with construct solves this problem. We can replace any return by with and then use what were returned values in further computation. In this way we can treat the output of a sub-query as input to another sub-query. We can actually string together as many with constructs as we like. Here is the query we want (using only one with):

match (g1:Genre)<-[:HAS_GENRE]-(:Movie)-[:HAS_GENRE]->(g2:Genre)
where g1.genre < g2.genre
with  g1.genre as genre1, g2.genre as genre2, count(*) as total
where total > 9
return genre1, genre2, total
order by total desc ;

The query returns:

+-----------------------------------+
| genre1      | genre2      | total |
+-----------------------------------+
| "Drama"     | "Thriller"  | 16    |
| "Drama"     | "Romance"   | 14    |
| "Biography" | "Drama"     | 13    |
| "Action"    | "Thriller"  | 13    |
| "Action"    | "Adventure" | 12    |
| "Adventure" | "Family"    | 12    |
| "Action"    | "Sci-Fi"    | 12    |
| "Animation" | "Comedy"    | 11    |
| "Comedy"    | "Drama"     | 11    |
| "Animation" | "Family"    | 11    |
| "Adventure" | "Comedy"    | 11    |
| "Comedy"    | "Family"    | 11    |
| "Crime"     | "Drama"     | 11    |
| "Adventure" | "Fantasy"   | 10    |
| "Adventure" | "Animation" | 10    |
+-----------------------------------+

The power of path-oriented queries

The real strength of Neo4j and Cypher is with path-oriented queries.

Here is a query that counts the number of co-actors for Jennifer Lawrence:

match (p1:Person)-[:ACTS_IN]->(:Movie)<-[:ACTS_IN]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
return count(*) as total ;

The query returns:

+-------+
| total |
+-------+
| 345   |
+-------+

Wait one second! Is this query correct? No! This query will count the same co-actors multiple times if they have acted in multiple movies with Jennifer Lawrence. How can with fix this? As follows:

match (p1:Person)-[:ACTS_IN]->(:Movie)<-[:ACTS_IN]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
return count(distinct p2) as total ;

The query returns:

+-------+
| total |
+-------+
| 335   |
+-------+

We can achieve the same results with the following query:

match (p1:Person)-[:ACTS_IN*2]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
return count(distinct p2) as total ;

Here the pattern (p1:Person)-[:ACTS_IN*2]-(p2:Person) matches any path containing two occurrences of the ACTS_IN relationship. Note that we have dropped the directional arrow (->) here since the direction alternates along such paths. A pattern such as (...)-[...]-(...) indicates we are not concerned with direction.

Cypher has a built-in function allshortestpaths that when applied to a path pattern returns all shortest instances of the path:

match path = allshortestpaths( (p1:Person)-[:ACTS_IN*]-(p2:Person) )
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name = 'Damon, Matt'
return distinct length(path)/2 as length ;

Here the pattern (p1:Person)-[:ACTS_IN*]-(p2:Person) matches all paths between p1 and p2 and allshortestpaths returns a list of the shortest paths. The query returns:

+--------+
| length |
+--------+
| 2      |
+--------+

(Try the query without the keyword distinct --- can you understand the new result?)

Let's take a look at one of these paths:

match path = allshortestpaths( (p1:Person)-[:ACTS_IN*]-(p2:Person) )
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name = 'Damon, Matt'
return path
limit 1;

The query returns:

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| path                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [Node[4904]{person_id:"3382035",name:"Lawrence, Jennifer (III)",gender:"female"},:ACTS_IN[20279]{role:"Rosalyn Rosenfeld",position:5},Node[3]{movie_id:"3171436",title:"American Hustle (2013)",year:2013},:ACTS_IN[20447]{role:"Bar Patron",position:0,note:"(uncredited)"},Node[6401]{person_id:"1762275",name:"Oliveira, Joseph (III)",gender:"male"},:ACTS_IN[26750]{role:"Officer",position:0,note:"(uncredited)"},Node[92]{movie_id:"4229028",title:"The Departed (2006)",year:2006},:ACTS_IN[26621]{role:"Colin Sullivan",position:2},Node[2049]{person_id:"531736",name:"Damon, Matt",gender:"male"}] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The result contains every node along the path together with its property set.

In order to display paths in a more readable form we can process paths using the reduce construct:

match path = allshortestpaths( (p1:Person)-[:ACTS_IN*]-(p2:Person) )
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name = 'Damon, Matt'
return reduce(t = "", i in nodes(path) | t + (case when i:Movie then i.title else i.name end) + "---") as simple_path;

The reduce construct is much like a fold from functional programming (SML, OCaml, Haskell). In this case, for each path t is initialised to the empty string, and then i iterates through the list of nodes in the path. At each step we compute t + (case when i:Movie then i.title else i.name end) + "---" where here t represents the results of the previous iteration.

The query returns:

+----------------------------------------------------------------------------------------------------------------------------+
| simple_path                                                                                                                |
+----------------------------------------------------------------------------------------------------------------------------+
| "Lawrence, Jennifer (III)---American Hustle (2013)---Oliveira, Joseph (III)---The Departed (2006)---Damon, Matt---"        |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Flynn, Steve (III)---The Departed (2006)---Damon, Matt---"            |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Silvia, Billy---The Departed (2006)---Damon, Matt---"                 |
| "Lawrence, Jennifer (III)---Silver Linings Playbook (2012)---Stiles, Julia---The Bourne Ultimatum (2007)---Damon, Matt---" |
| "Lawrence, Jennifer (III)---Silver Linings Playbook (2012)---Falvo, Mark---The Departed (2006)---Damon, Matt---"           |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Falvo, Mark---The Departed (2006)---Damon, Matt---"                   |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Corazzini, Jeffrey---The Departed (2006)---Damon, Matt---"            |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Cleary, Bo---The Departed (2006)---Damon, Matt---"                    |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Boston, David (IV)---The Departed (2006)---Damon, Matt---"            |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Boston, David (IV)---The Bourne Ultimatum (2007)---Damon, Matt---"    |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Ford, Jim (III)---The Departed (2006)---Damon, Matt---"               |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Garo, Armen---The Departed (2006)---Damon, Matt---"                   |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Giannone, Jay---The Departed (2006)---Damon, Matt---"                 |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Franchi, John (I)---The Departed (2006)---Damon, Matt---"             |
| "Lawrence, Jennifer (III)---American Hustle (2013)---Peña, Michael (III)---The Martian (2015)---Damon, Matt---"            |
+----------------------------------------------------------------------------------------------------------------------------+

Practical Exercises (ticks) from 2016

Exercise 2.a

Recall from above that this query

match (p1:Person)-[:ACTS_IN]->(:Movie)<-[:ACTS_IN]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
return count(*) as total ;

and this query

match (p1:Person)-[:ACTS_IN]->(:Movie)<-[:ACTS_IN]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
return count(distinct p2) as total ;

will return different results. That is because some co-stars are counted twice in the first query and only once (correctly) in the second query. Who are the co-stars counted multiple times and how many movies did they act in with Jennifer Lawrence? Let's use Cypher to find out!

Complete the following Cypher query so that it returns rows name, total where the name column is the name of an actor, and the total column indicates the number of movies in which they co-starred with Jennifer Lawrence. Note, here we want the total to always be greater than 1. (HINT: Consider using the WITH construct.)

   YOUR-CODE-GOES-HERE
   return name, total
   order by name, total;

Solution notes

One possible solution:

match (p1:Person)-[:ACTS_IN]->(:Movie)<-[:ACTS_IN]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
with p2.name as name, count(*) as total
where total > 1
return name, total
order by name, total;

This query produces the output:

+-----------------------------------+
| name                      | total |
+-----------------------------------+
| "Birnbaum, Elias"         | 2     |
| "Cooper, Bradley (I)"     | 2     |
| "De Niro, Robert"         | 2     |
| "Eklund Jr., Dicky"       | 2     |
| "Falvo, Mark"             | 2     |
| "Herman, Paul (I)"        | 2     |
| "Meck, Patsy"             | 2     |
| "Russell, Matthew (VIII)" | 2     |
| "Webb, Volieda"           | 2     |
| "Whigham, Shea"           | 2     |
+-----------------------------------+

Exercise 2.b

In the tutorial we computed the distance between Jennifer Lawrence and Matt Damon (using the ACTS_IN relationship). The distance between two actors A and B is 1 if they co-starred in the same movie; the distance is 2 if there is some other actor X such that A co-starred with X on some movie and X co-starred with B on some other movie; and so on.

For this exercise we will compute the distance between all of our genres. The definition of distance is similar: for example, the distance between two genres is 1 if there is at least one movie that is associated with those two genres. We can use this distance as a measure of how similar two genres are.

We will use the HAS_GENRE relationship rather than the ACTS_IN relationship. Your query should output the distance between every genre and every other genre that exists in the database. Make sure your query enforces the constraint g1.genre < g2.genre so that each pair of genres is only listed once (if you swap the order of the two genres, the distance is still the same).

   YOUR-CODE-GOES-HERE
   return distinct g1.genre as genre1, g2.genre as genre2, length(path)/2 as length
   order by length desc, genre1, genre2;

Solution notes

One possible solution:

match path = allshortestpaths( (g1:Genre)-[:HAS_GENRE*]-(g2:Genre) )
where g1.genre < g2.genre
return distinct g1.genre as genre1, g2.genre as genre2, length(path)/2 as length
order by length desc, genre1, genre2;

This query produces the output:

+----------------------------------------+
| genre1        | genre2        | length |
+----------------------------------------+
| "Animation"   | "News"        | 3      |
| "Comedy"      | "News"        | 3      |
| "Family"      | "News"        | 3      |
| "Fantasy"     | "News"        | 3      |
| "Horror"      | "News"        | 3      |
| "Musical"     | "News"        | 3      |
| "Mystery"     | "News"        | 3      |
| "News"        | "Sci-Fi"      | 3      |
| "News"        | "Western"     | 3      |
| "Action"      | "Biography"   | 2      |
| "Action"      | "Documentary" | 2      |
| "Action"      | "Horror"      | 2      |
| "Action"      | "Musical"     | 2      |
| "Action"      | "News"        | 2      |
| "Action"      | "Romance"     | 2      |
| "Action"      | "Sport"       | 2      |
| "Action"      | "Western"     | 2      |
| "Adventure"   | "Biography"   | 2      |
| "Adventure"   | "History"     | 2      |
| "Adventure"   | "Horror"      | 2      |
| "Adventure"   | "Music"       | 2      |
| "Adventure"   | "News"        | 2      |
| "Adventure"   | "Sport"       | 2      |
| "Animation"   | "Documentary" | 2      |
| "Animation"   | "History"     | 2      |
| "Animation"   | "Horror"      | 2      |
| "Animation"   | "Music"       | 2      |
| "Animation"   | "Romance"     | 2      |
| "Animation"   | "Sport"       | 2      |
| "Animation"   | "Thriller"    | 2      |
| "Animation"   | "Western"     | 2      |
| "Biography"   | "Comedy"      | 2      |
| "Biography"   | "Crime"       | 2      |
| "Biography"   | "Family"      | 2      |
| "Biography"   | "Fantasy"     | 2      |
| "Biography"   | "Horror"      | 2      |
| "Biography"   | "Musical"     | 2      |
| "Biography"   | "Mystery"     | 2      |
| "Biography"   | "News"        | 2      |
| "Biography"   | "Sci-Fi"      | 2      |
| "Biography"   | "Western"     | 2      |
| "Comedy"      | "Documentary" | 2      |
| "Comedy"      | "History"     | 2      |
| "Comedy"      | "Horror"      | 2      |
| "Comedy"      | "Sci-Fi"      | 2      |
| "Comedy"      | "Sport"       | 2      |
| "Comedy"      | "Thriller"    | 2      |
| "Comedy"      | "War"         | 2      |
| "Comedy"      | "Western"     | 2      |
| "Crime"       | "Fantasy"     | 2      |
| "Crime"       | "Horror"      | 2      |
| "Crime"       | "Musical"     | 2      |
| "Crime"       | "News"        | 2      |
| "Crime"       | "Romance"     | 2      |
| "Crime"       | "Sport"       | 2      |
| "Documentary" | "Family"      | 2      |
| "Documentary" | "Fantasy"     | 2      |
| "Documentary" | "Horror"      | 2      |
| "Documentary" | "Musical"     | 2      |
| "Documentary" | "Mystery"     | 2      |
| "Documentary" | "Romance"     | 2      |
| "Documentary" | "Sci-Fi"      | 2      |
| "Documentary" | "Thriller"    | 2      |
| "Documentary" | "Western"     | 2      |
| "Drama"       | "News"        | 2      |
| "Family"      | "History"     | 2      |
| "Family"      | "Horror"      | 2      |
| "Family"      | "Music"       | 2      |
| "Family"      | "Romance"     | 2      |
| "Family"      | "Sport"       | 2      |
| "Family"      | "Thriller"    | 2      |
| "Family"      | "War"         | 2      |
| "Family"      | "Western"     | 2      |
| "Fantasy"     | "History"     | 2      |
| "Fantasy"     | "Horror"      | 2      |
| "Fantasy"     | "Music"       | 2      |
| "Fantasy"     | "Romance"     | 2      |
| "Fantasy"     | "Sport"       | 2      |
| "Fantasy"     | "Thriller"    | 2      |
| "Fantasy"     | "Western"     | 2      |
| "History"     | "Horror"      | 2      |
| "History"     | "Music"       | 2      |
| "History"     | "Musical"     | 2      |
| "History"     | "Mystery"     | 2      |
| "History"     | "Sci-Fi"      | 2      |
| "History"     | "Sport"       | 2      |
| "History"     | "Western"     | 2      |
| "Horror"      | "Music"       | 2      |
| "Horror"      | "Musical"     | 2      |
| "Horror"      | "Sci-Fi"      | 2      |
| "Horror"      | "Sport"       | 2      |
| "Horror"      | "War"         | 2      |
| "Horror"      | "Western"     | 2      |
| "Music"       | "Mystery"     | 2      |
| "Music"       | "News"        | 2      |
| "Music"       | "Sci-Fi"      | 2      |
| "Music"       | "Sport"       | 2      |
| "Music"       | "War"         | 2      |
| "Music"       | "Western"     | 2      |
| "Musical"     | "Mystery"     | 2      |
| "Musical"     | "Sci-Fi"      | 2      |
| "Musical"     | "Sport"       | 2      |
| "Musical"     | "Thriller"    | 2      |
| "Musical"     | "War"         | 2      |
| "Musical"     | "Western"     | 2      |
| "Mystery"     | "Romance"     | 2      |
| "Mystery"     | "Sport"       | 2      |
| "Mystery"     | "War"         | 2      |
| "Mystery"     | "Western"     | 2      |
| "News"        | "Romance"     | 2      |
| "News"        | "Sport"       | 2      |
| "News"        | "Thriller"    | 2      |
| "News"        | "War"         | 2      |
| "Romance"     | "Sport"       | 2      |
| "Romance"     | "Thriller"    | 2      |
| "Romance"     | "War"         | 2      |
| "Romance"     | "Western"     | 2      |
| "Sci-Fi"      | "Sport"       | 2      |
| "Sci-Fi"      | "Western"     | 2      |
| "Sport"       | "Thriller"    | 2      |
| "Sport"       | "War"         | 2      |
| "Sport"       | "Western"     | 2      |
| "War"         | "Western"     | 2      |
| "Action"      | "Adventure"   | 1      |
| "Action"      | "Animation"   | 1      |
| "Action"      | "Comedy"      | 1      |
| "Action"      | "Crime"       | 1      |
| "Action"      | "Drama"       | 1      |
| "Action"      | "Family"      | 1      |
| "Action"      | "Fantasy"     | 1      |
| "Action"      | "History"     | 1      |
| "Action"      | "Music"       | 1      |
| "Action"      | "Mystery"     | 1      |
| "Action"      | "Sci-Fi"      | 1      |
| "Action"      | "Thriller"    | 1      |
| "Action"      | "War"         | 1      |
| "Adventure"   | "Animation"   | 1      |
| "Adventure"   | "Comedy"      | 1      |
| "Adventure"   | "Crime"       | 1      |
| "Adventure"   | "Documentary" | 1      |
| "Adventure"   | "Drama"       | 1      |
| "Adventure"   | "Family"      | 1      |
| "Adventure"   | "Fantasy"     | 1      |
| "Adventure"   | "Musical"     | 1      |
| "Adventure"   | "Mystery"     | 1      |
| "Adventure"   | "Romance"     | 1      |
| "Adventure"   | "Sci-Fi"      | 1      |
| "Adventure"   | "Thriller"    | 1      |
| "Adventure"   | "War"         | 1      |
| "Adventure"   | "Western"     | 1      |
| "Animation"   | "Biography"   | 1      |
| "Animation"   | "Comedy"      | 1      |
| "Animation"   | "Crime"       | 1      |
| "Animation"   | "Drama"       | 1      |
| "Animation"   | "Family"      | 1      |
| "Animation"   | "Fantasy"     | 1      |
| "Animation"   | "Musical"     | 1      |
| "Animation"   | "Mystery"     | 1      |
| "Animation"   | "Sci-Fi"      | 1      |
| "Animation"   | "War"         | 1      |
| "Biography"   | "Documentary" | 1      |
| "Biography"   | "Drama"       | 1      |
| "Biography"   | "History"     | 1      |
| "Biography"   | "Music"       | 1      |
| "Biography"   | "Romance"     | 1      |
| "Biography"   | "Sport"       | 1      |
| "Biography"   | "Thriller"    | 1      |
| "Biography"   | "War"         | 1      |
| "Comedy"      | "Crime"       | 1      |
| "Comedy"      | "Drama"       | 1      |
| "Comedy"      | "Family"      | 1      |
| "Comedy"      | "Fantasy"     | 1      |
| "Comedy"      | "Music"       | 1      |
| "Comedy"      | "Musical"     | 1      |
| "Comedy"      | "Mystery"     | 1      |
| "Comedy"      | "Romance"     | 1      |
| "Crime"       | "Documentary" | 1      |
| "Crime"       | "Drama"       | 1      |
| "Crime"       | "Family"      | 1      |
| "Crime"       | "History"     | 1      |
| "Crime"       | "Music"       | 1      |
| "Crime"       | "Mystery"     | 1      |
| "Crime"       | "Sci-Fi"      | 1      |
| "Crime"       | "Thriller"    | 1      |
| "Crime"       | "War"         | 1      |
| "Crime"       | "Western"     | 1      |
| "Documentary" | "Drama"       | 1      |
| "Documentary" | "History"     | 1      |
| "Documentary" | "Music"       | 1      |
| "Documentary" | "News"        | 1      |
| "Documentary" | "Sport"       | 1      |
| "Documentary" | "War"         | 1      |
| "Drama"       | "Family"      | 1      |
| "Drama"       | "Fantasy"     | 1      |
| "Drama"       | "History"     | 1      |
| "Drama"       | "Horror"      | 1      |
| "Drama"       | "Music"       | 1      |
| "Drama"       | "Musical"     | 1      |
| "Drama"       | "Mystery"     | 1      |
| "Drama"       | "Romance"     | 1      |
| "Drama"       | "Sci-Fi"      | 1      |
| "Drama"       | "Sport"       | 1      |
| "Drama"       | "Thriller"    | 1      |
| "Drama"       | "War"         | 1      |
| "Drama"       | "Western"     | 1      |
| "Family"      | "Fantasy"     | 1      |
| "Family"      | "Musical"     | 1      |
| "Family"      | "Mystery"     | 1      |
| "Family"      | "Sci-Fi"      | 1      |
| "Fantasy"     | "Musical"     | 1      |
| "Fantasy"     | "Mystery"     | 1      |
| "Fantasy"     | "Sci-Fi"      | 1      |
| "Fantasy"     | "War"         | 1      |
| "History"     | "News"        | 1      |
| "History"     | "Romance"     | 1      |
| "History"     | "Thriller"    | 1      |
| "History"     | "War"         | 1      |
| "Horror"      | "Mystery"     | 1      |
| "Horror"      | "Romance"     | 1      |
| "Horror"      | "Thriller"    | 1      |
| "Music"       | "Musical"     | 1      |
| "Music"       | "Romance"     | 1      |
| "Music"       | "Thriller"    | 1      |
| "Musical"     | "Romance"     | 1      |
| "Mystery"     | "Sci-Fi"      | 1      |
| "Mystery"     | "Thriller"    | 1      |
| "Romance"     | "Sci-Fi"      | 1      |
| "Sci-Fi"      | "Thriller"    | 1      |
| "Sci-Fi"      | "War"         | 1      |
| "Thriller"    | "War"         | 1      |
| "Thriller"    | "Western"     | 1      |
+----------------------------------------+

Exercise 2.c

Let's build a simple recommendation engine. Starting from a known movie that you liked, write a query to finds similar movies that you might also enjoy.

Let A be the movie that you liked. Some other movie B should be considered for recommendation if A and B have at least one genre in common, at least one keyword in common, and at least one actor in common. Furthermore, you can calculate a similarity score as follows: 1 point for every keyword that A and B have in common, and 10 points for every actor the movies have in common.

Write a query that produces recommendations for someone who liked Skyfall (2012). The results should give the title of each recommended movie and the similarity score, and be sorted by score:

    match
      YOUR-CODE-GOES-HERE
    return title, score
    order by score desc;

Solution notes

One possible solution:

match
  (liked:Movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(rec:Movie),
  (liked)<-[:ACTS_IN]-(actor:Person)-[:ACTS_IN]->(rec),
  (liked)-[:HAS_GENRE]->()<-[:HAS_GENRE]-(rec)
where liked.title = 'Skyfall (2012)'
with rec, count(distinct keyword) as common_keywords, count(distinct actor) as common_actors
return rec.title as title, 10*common_actors + common_keywords as score
order by score desc;

This query produces the output:

+---------------------------------------------------------------+
| title                                                 | score |
+---------------------------------------------------------------+
| "Star Wars: Episode VII - The Force Awakens (2015)"   | 159   |
| "The Dark Knight (2008)"                              | 126   |
| "Casino Royale (2006)"                                | 105   |
| "Wonder Woman (2017)"                                 | 99    |
| "Mission: Impossible - Ghost Protocol (2011)"         | 85    |
| "Harry Potter and the Deathly Hallows: Part 2 (2011)" | 56    |
| "The Bourne Ultimatum (2007)"                         | 52    |
| "The Hurt Locker (2008)"                              | 46    |
| "No Country for Old Men (2007)"                       | 40    |
| "Argo (2012)"                                         | 25    |
+---------------------------------------------------------------+