Graph databases: Practical Exercises

We assume you have already completed the tutorial on a separate page.

Please use the Cypher query language to solve each of the following exercises.

In Moodle, please submit three files: exercise2a.cypher, exercise2b.cypher, and exercise2c.cypher. If you haven't completed all parts, you can submit blank files. However, we recommend developing and debugging on your own machine before going to Moodle.

Exercise 2.a

Recall from the tutorial that this query:

match (p1:Person)-[:ACTS_IN]->(:Movie)<-[:ACTS_IN]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
return count(*) as total ;

returns a total of 331 (for the small database), while this query:

match (p1:Person)-[:ACTS_IN]->(:Movie)<-[:ACTS_IN]-(p2:Person)
where p1.name = 'Lawrence, Jennifer (III)'
  and p2.name <> 'Lawrence, Jennifer (III)'
return count(distinct p2) as total ;

returns a total of 321. That is because some co-stars are counted twice in the first query and only once (correctly) in the second query. Who are the co-stars counted multiple times and how many movies did they act in with Jennifer Lawrence? Let's use Cypher to find out!

Complete the following Cypher query so that it returns rows name, total where the name column is the name of an actor, and the total column indicates the number of movies in which they co-starred with Jennifer Lawrence. Note, here we want the total to always be greater than 1. (HINT: Consider using the WITH construct.)

   YOUR-CODE-GOES-HERE
   return name, total
   order by name, total;

Exercise 2.b

In the tutorial we computed the distance between Jennifer Lawrence and Matt Damon (using the ACTS_IN relationship). The distance between two actors A and B is 1 if they co-starred in the same movie; the distance is 2 if there is some other actor X such that A co-starred with X on some movie and X co-starred with B on some other movie; and so on.

For this exercise we will compute the distance between all of our genres. The definition of distance is similar: for example, the distance between two genres is 1 if there is at least one movie that is associated with those two genres. We can use this distance as a measure of how similar two genres are.

We will use the HAS_GENRE relationship rather than the ACTS_IN relationship. Your query should output the distance between every genre and every other genre that exists in the database. Make sure your query enforces the constraint g1.genre < g2.genre so that each pair of genres is only listed once (if you swap the order of the two genres, the distance is still the same).

   match (g:Genre)
   with g.genre as genre1
      YOUR-CODE-GOES-HERE
   return distinct genre1, g2.genre as genre2, length(path)/2 as length
   order by length desc, genre1, genre2;

Exercise 2.c

Let's build a simple recommendation engine. Starting from a known movie that you liked, write a query to finds similar movies that you might also enjoy.

Let A be the movie that you liked. Some other movie B should be considered for recommendation if A and B have at least one genre in common, at least one keyword in common, and at least one actor in common. Furthermore, you can calculate a similarity score as follows: 1 point for every keyword that A and B have in common, and 10 points for every actor the movies have in common.

Write a query that produces recommendations for someone who liked Skyfall (2012). The results should give the title of each recommended movie and the similarity score, and be sorted by score:

    match
      YOUR-CODE-GOES-HERE
    return title, score
    order by score desc;