1 - Data Analysis Report

Every student will analyze an assigned dataset: the analysis should be reported in a document of no more than 4,000 words where the results are commented and justified. This will be worth 70% of the final mark.

Each student is encouraged to come up with original project proposals which should be inspired by the material in the lectures.

The report should be submitted via Moodle.

Deadline for choice: 10th February 2017

Deadline for submission: 13th March 2017


Datasets will be assigned on first come first served basis. Multiple students might be assigned to the same dataset (or a portion of the same dataset).

  1. Airbnb: AirbnB data of listings, reviews, geographical data of various cities. Information here.
  2. Facebook: friendship connections within one regional network and information about interaction events between users, such as likes, comments and the like. (657,681 nodes and 1,302,764 undirected edges).
  3. Foursquare: Foursquare venues with timestamped transitions for New York.
  4. Human Smugglers Network: network of phone calls made by suspects operating along the Eastern Mediterranean route. It includes 8,943 nodes and 82,979 ties (of which 48 nodes and 10,319 ties constitute the sub-network of suspects). There is a time-stamp for each phone call (Summer 2011). There is also information on the role played by each smuggler and the country of residence.
  5. Protein-protein interaction: interaction network protein to protein in budding yeast. Information here.
  6. Taxi Rides: New York taxi trip records of pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. Information here.
  7. Stack Overflow temporal network: temporal network of interactions on the stack exchange web site Stack Overflow. Network information here.
  8. Wikipedia: discussion network between Wikipedia users. (2,394,385 nodes and 5,021,410 directed edges)

2 - Presentation

Every student will prepare a brief oral presentation of the project findings. Each presentation MUST last 8 minutes, including questions: thus, a reasonable combination would be a 6-minute talk followed by 2 minutes for questions.

The presentations will take place on 14th March 2017.

Slides must be sent by email to cm542 in PDF fomat before 23:59 13th March 2017.

The presentation is worth 30% of the final mark and it will be evaluated along these categories:

  1. Presentation skills [20%]
  2. General knowledge of the subject [20%]
  3. Discussion of findings [40%]
  4. Questions and answers [20%]