Computer Laboratory

Course pages 2014–15

Social and Technological Network Analysis

1 - Research paper report

Every student will prepare one report (of approximately 1,500 words) on one assigned research paper. The report is worth 30% of the final mark.

The report will contain two parts of about 750 words each:

  • Critical analysis of the papers including, possibly, comparisons and references to other material presented in the course or found by the student and comments on how solid the result obtained are (e.g., comments on the evaluation methods or on the analysis applied can be included).
  • Discussion of possible future research ideas in the area.

This is a selection of mainly recent research papers on social and technological networks. Choose any still available paper and e-mail your choice to cm542: the submission of reports is to be done via email to cm542 in PDF format *in addition to handing a hard copy to the office*.

Deadline for choice: 23rd January 2015

Deadline for submission: 13th February 2015. Noon.

  1. Hang-Hyun Jo, Jari Saramäki, Robin I. M. Dunbar and Kimmo Kaski. Spatial patterns of close relationships across the lifespan. Scientific Reports 4, Article number: 6988. Nov. 2014.
  2. Janos Szule, Daniel Kondor, Laszlo Dobos, Istvan Csabai, Gabor Vattay. Lost in the City: Revisiting Milgram's Experiment in the Age of Social Networks. PLoS ONE 2014.
  3. Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. 2009. Social influence analysis in large-scale networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 807-816.
  4. Saramäki, J., Leicht, E. A., López, E., Roberts, S. G., Reed-Tsochas, F., & Dunbar, R. I. (2014). Persistence of social signatures in human communication. Proceedings of the National Academy of Sciences, 111(3), 942-947.
  5. Liu, Yang-Yu, Jean-Jacques Slotine, and Albert-László Barabási. Controllability of complex networks. Nature 473.7346 (2011): 167-173.
  6. Myers, Seth A., Chenguang Zhu, and Jure Leskovec. Information diffusion and external influence in networks. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 33-41. ACM, 2012.
  7. Backstrom, Lars, and Jon Kleinberg. Romantic partnerships and the dispersion of social ties: A network analysis of relationship status on Facebook.Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 2014.
  8. Eagle, Nathan, Alex Sandy Pentland, and David Lazer. Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences 106, no. 36 (2009): 15274-15278.
  9. Gomez Rodriguez, Manuel, Jure Leskovec, and Andreas Krause. Inferring networks of diffusion and influence. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010.
  10. Friggeri, Adrien, Lada Adamic, Dean Eckles, and Justin Cheng. Rumor Cascades. Proceedings of the eighth international AAAI conference on Weblogs and social media. AAAI, 2014.
  11. Eytan Bakshy, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Everyone's an influencer: quantifying influence on twitter. In Proceedings of the fourth ACM international conference on Web search and data mining (WSDM '11). ACM, New York, NY, USA, 65-74.
  12. Romero, Daniel M., Brendan Meeder, and Jon Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. Proceedings of the 20th international conference on World wide web. ACM, 2011.
  13. David J. Crandalla, Lars Backstromb, Dan Cosleyc, Siddharth Surib, Daniel Huttenlocherb, and Jon Kleinberg. Inferring social ties from geographic coincidences. Proceedings of the National Academy of Sciences 107.52 (2010): 22436-22441.
  14. Paul Expert, Tim S. Evans, Vincent D. Blondel, and Renaud Lambiotte. Uncovering space-independent communities in spatial networks. Proceedings of the National Academy of Sciences 108.19 (2011): 7663-7668.
  15. Clauset, Aaron, Cristopher Moore, and Mark EJ Newman. Hierarchical structure and the prediction of missing links in networks. Nature 453.7191 (2008): 98-101.
  16. A. Goyal, F. Bonchi, L.V.S. Lakshmanan. Learning influence probabilities in social networks. In Proc. WSDM, 2010.
  17. Rongjing Xiang, Jennifer Neville, Monica Rogati. Modeling Relationship Strength in Online Social Network In Proc. WWW, 2010.
  18. D.M. Romero and J. Kleinberg. The Directed Closure Process in Hybrid Social-Information Networks, with an Analysis of Link Formation on Twitter. Proc. 4th International AAAI Conference on Weblogs and Social Media, 2010.
  19. E. Sadikov, M. Medina, J. Leskovec, H. Garcia-Molina. Correcting for Missing Data in Information Cascades ACM International Conference on Web Search and Data Mining (WSDM), 2011
  20. Paul Expert, Tim S. Evans, Vincent D. Blondel, Renaud Lambiotte. Uncovering space-independent communities in spatial networks. Proceedings of the National Academy of Sciences, Vol. 108, No. 19. (10 May 2011), pp. 7663-7668.
  21. Michele Coscia, Giulio Rossetti, Fosca Giannotti, Dino Pedreschi. DEMON: a local-first discovery method for overlapping communities. Proceedings of KDD 2012.
  22. Justin Cranshaw, Raz Schwartz, Jason I. Hong and Norman Sadeh. The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City. Proceedings of ICWSM12.

2 - Project

Every student will complete a project which consists of analysis of an assigned dataset according to some indicated network measures using NetworkX: the analysis should be reported in a document of about 2,500 words where the results are commented and justified. This will be worth 50% of the final mark.

Each student is encouraged to come up with original project proposals which may be inspired by the ideas reported below. The submission of reports is to be done via email to cm542 in PDF format *in addition to handing a hard copy to the office*.

Deadline for choice: 13th February 2015

Deadline for submission: 9th March 2015 at noon

Datasets (list to be confirmed)

  1. Amazon: co-purchase product network and all product information. (548,552 products)
    Project ideas:
    • extraction of network communities and analysis of their homogeneity with respect to product categories;
    • analysis of potential correlations between network node metrics and product sales/reviews;
    • prediction model for product sales given network properties and product characteristics.
  2. HEP-PH: citation graph among papers in high-energy physics with temporal publication data of each paper. (34,546 nodes and 421,578 directed edges)
    Project ideas:
    • investigation of power-law network structure evolution over time with a generative model;
    • analysis of the first-mover advantage for scientific publications;
    • analysis/design of ranking algorithms to facilitate search among publications.
  3. Epinions: trust and distrust signed social network among users on Epinions.com. (131,828 nodes and 841,372 directed edges)
    Project ideas:
    • analysis of the structure of the social network arising from positive, negative and aggregated edges, with an investigation of the correlations among them;
    • analysis of social triangles and verification of structural balance theory ("the enemy of my enemy is my friend");
    • prediction models of the sign of a social link.
  4. Facebook: friendship connections within one regional network and information about interaction events between users, such as likes, comments and the like. (657,681 nodes and 1,302,764 undirected edges)
    Project ideas:
    • analysis and comparison of the social network among Facebook users and the network arising from their explicit interactions;
    • prediction models of interaction between users;
    • analysis of user activity as a function of ego-network properties.
  5. Cond-Mat: collaboration network extracted from the e-print arXiv, covering co-authorship ties from the papers submitted to the Condensed Matter category. (23,133 nodes and 186,936 undirected edges)
    Project ideas:
    • analysis of the social properties of the coauthorship networl
    • study of correlation between scientific productivity and network position
  6. Enron: email communication network between Enron employees and with external email addresses. (36,692 nodes and 367,662 undirected edges)
    Project ideas:
    • study of the statistical properties of an email network
    • analysis of the communication patterns
  7. Roads: road network of Pennsylvania, each node is an intersection and each edge is a road. (1,088,092 nodes and 3,083,796 undirected edges)
    Project ideas:
    • characteristics of a planar networks
    • community detection in a planar network
  8. Web: web hyperlinks between pages in the stanford.edu domain. (281,903 nodes and 2,312,497 directed edges)
    Project ideas:
    • centrality measures in a Web graph
    • ranking of Web pages
  9. Wikipedia: discussion network between Wikipedia users. (2,394,385 nodes and 5,021,410 directed edges)
    Project ideas:
    • statistical properties of a large-scale discussion network
    • social network properties of Wikipedia users
  10. Brightkite: social networks arising among Brightkite users, including also a total of 4,491,143 checkins of these users over the period of Apr. 2008 - Oct. 2010. (2,394,385 nodes and 5,021,410 directed edges)
    Project ideas:
    • spatial properties of the social network
    • analysis of the relationship between user mobility and social properties

3 - Presentation

Every student will prepare a brief oral presentation of the project findings. Each presentation MUST last 8 minutes, including questions: thus, a reasonable combination would be a 6-minute talk followed by 2 minutes for questions.

The presentations will take place on 10th March 2015.

Slides must be sent by email to cm542 in PDF fomat before 23:59 9th March 2015.

The presentation is worth 20% of the final mark and it will be evaluated along these categories:

  1. Presentation skills [20%]
  2. General knowledge of the subject [20%]
  3. Discussion of findings [40%]
  4. Questions and answers [20%]