Evaluation of decentralized email architecture and social network analysis based on email attachment sharing

Gregory Y. Tsipenyuk

March 2018, 153 pages

This technical report is based on a dissertation submitted August 2017 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Sidney Sussex College.


Present day e-mail is provided by centralized services running in the cloud. The services transparently connect users behind middleboxes and provide backup, redundancy, and high availability at the expense of user privacy. In present day mobile environments, users can access and modify e-mail from multiple devices with updates reconciled on the central server. Prioritizing updates is difficult and may be undesirable. Moreover, legacy email protocols do not provide optimal e-mail synchronization and access. Recent phenomena of the Internet of Things (IoT) will see the number of interconnected devices grow to 27 billion by 2021. In the first part of my dissertation I am proposing a decentralized email architecture which takes advantage of user’s a IoT devices to maintain a complete email history. This addresses the e-mail reconciliation issue and places data under user control. I replace legacy email protocols with a synchronization protocol to achieve eventual consistency of email and optimize bandwidth and energy usage. The architecture is evaluated on a Raspberry Pi computer.

There is an extensive body of research on Social Network Analysis (SNA) based on email archives. Typically, the analyzed network reflects either communication between users or a relationship between the e-mail and the information found in the e-mail’s header and the body. This approach discards either all or some email attachments that cannot be converted to text; for instance, images. Yet attachments may use up to 90% of an e-mail archive size. In the second part of my dissertation I suggest extracting the network from e-mail attachments shared between users. I hypothesize that the network extracted from shared e-mail attachments might provide more insight into the social structure of the email archive. I evaluate communication and shared e-mail attachments networks by analyzing common centrality measures and classification and clustering algorithms. I further demonstrate how the analysis of the shared attachments network can be used to optimize the proposed decentralized e-mail architecture.

