Supervision 3 Solution Notes

Below are model answers to help supervisors. The solutions here are not the only possible solutions, but merely suggestions.

Questions to discuss with your supervisor

  1. How would you add a new movie to the document database? (You can write code if you want, or just describe it in words.) Compare this with the relational database.

    Supervisor notes: First, you'd need to pick a new ID for the movie that has not been taken before. Then you would construct a JSON document containing all the information about the movie, and store it in the database. You also need to go over the list of actors and crew members who worked on the movie: for any person who is not yet in the database, you'd need to add them (again picking a new ID), and for any person who is already in the database, you'd need to add the movie to their JSON object.

    You need to make changes in several different places because the database is not "normalized": that is, information is duplicated in different places. For example, the title of a movie appears on the movie object itself, but it is also copied on the biography of every person who worked on that movie. In contrast, adding a new movie in the relational case is much easier. Why? The point: if you need to update the document database in this way, then perhaps it is better to use a relational database as your primary store (source of truth) and extract the document database periodically.

  2. How would you compute Bacon Numbers using the document database? Compare this with the graph database Neo4j.

    Supervisor notes: We would have to write code to implement some kind of depth-first or breadth-first traversal of a (virtual) tree formed by movies and people objects starting at the root set of movies containing Kevin Bacon as an actor. Clearly, the Neo4j Cypher queries presented in Lecture allow us to write such queries at a very high-level. The point: if you need to query the document database in this way, then perhaps it is better to use a graph database as your primary store (source of truth) and extract the document database periodically.

  3. Suppose that we are presented with only the document database containing movies and people. Could we use this database to help us reconstruct an Entity-Relationship model of the data? Could you even consider automating this process?

    Supervisor notes: Probably. It seems that movies and people are clearly entity types. The relationships are probably associated with the lists contained in movie and people objects. However, one would have to check that the movie-oriented representations are consistent with people-oriented representations of the same relationship.