1 - Data Analysis Report

Every student will analyze an assigned dataset: the analysis should be reported in a document of no more than 4,000 words where the results are commented and justified. This will be worth 70% of the final mark.

Each student is encouraged to come up with original project proposals which should be inspired by the material in the lectures.

A suggested structure (although not prescriptive) of the report could be:

  • Introduction and background to the idea under investigation, including related work if any.
  • Introduction to the dataset and basic/classic analysis.
  • Analisys report of the idea under investigation.
  • Discussion of limitation and possible other ideas.

The marking will be distributed to these various components in the following manner:

  • Background and motivation reporting [10%]
  • Basic analysis [30%]
  • Analysis results report [50%]
  • Discussion of limitations and future work [10%]

The report should be submitted via Moodle.

Deadline for choice: 10th February 2017

Deadline for submission: 13th March 2017


Datasets will be assigned on first come first served basis. Multiple students might be assigned to the same dataset (or a portion of the same dataset).

  1. Airbnb: AirbnB data of listings, reviews, geographical data of various cities. Information here.
  2. Facebook: friendship connections within one regional network and information about interaction events between users, such as likes, comments and the like. (657,681 nodes and 1,302,764 undirected edges).
  3. Foursquare: Foursquare venues with timestamped transitions for New York.
  4. Human Smugglers Network: network of phone calls made by suspects operating along the Eastern Mediterranean route. It includes 8,943 nodes and 82,979 ties (of which 48 nodes and 10,319 ties constitute the sub-network of suspects). There is a time-stamp for each phone call (Summer 2011). There is also information on the role played by each smuggler and the country of residence.
  5. Protein-protein interaction: interaction network protein to protein in budding yeast. Information here.
  6. Taxi Rides: New York taxi trip records of pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. Information here.
  7. Stack Overflow temporal network: temporal network of interactions on the stack exchange web site Stack Overflow. Network information here.
  8. Wikipedia: discussion network between Wikipedia users. (2,394,385 nodes and 5,021,410 directed edges)

2 - Presentation

Every student will prepare a brief oral presentation of the project findings. Each presentation MUST last 8 minutes, including questions: thus, a reasonable combination would be a 6-minute talk followed by 2 minutes for questions.

The presentations will take place on 14th March 2017.

Slides must be sent by email to cm542 in PDF fomat before 23:59 13th March 2017.

The presentation is worth 30% of the final mark and it will be evaluated along these categories:

  1. Presentation skills [20%]
  2. General knowledge of the subject [20%]
  3. Discussion of findings [40%]
  4. Questions and answers [20%]