TRVE Data Logo
TRVE Data: Placing a bit less trust in the cloud

Cloud-based collaboration tools such as Google Docs, Evernote, iCloud and Dropbox are very convenient for users, but problematic in terms of security and resilience.

At present, most such services are provided by companies through a centralised server infrastructure, which carries risks: if the company providing the service goes out of business, or decides to discontinue a product, the software stops working, and users are locked out of the documents and data created with that software. Moreover, since those servers typically process user data in unencrypted form, a rogue employee, or a hacker who gains access to the servers, can read and tamper with vast amounts of sensitive data.

The goal of the TRVE Data project (pronounced “true data”) is to build the foundation for the next generation of collaboration software, providing stronger security and resilience than the current practice. This approach will benefit professions that deal with highly sensitive data, who often cannot use cloud services since they are subject to strict compliance regulations and confidentiality obligations. For example:

  • doctors deal with electronic medical records,
  • legal advisors communicate with their clients under legal privilege,
  • journalists collaborate on stories and communicate with their sources,
  • engineers work on designs for power stations and chemical plants,
  • financial services handle sensitive client details,
  • diplomats privately negotiate solutions to international disagreements, and
  • law enforcement services share details of ongoing investigations.

We are developing algorithms, protocols, and code that allow real-time collaboration and data synchronisation across several devices without relying on central servers. We call this approach local-first software. Our research is based on the following principles:

  • End-to-end encryption.

    Today's Internet services typically process data in unencrypted form on their servers, and employ encryption (e.g. TLS) only for communication between servers and end-user devices (such as laptops or smartphones). Hence, users depend on the cloud provider to prevent unauthorised access and to maintain integrity of the data.

    In contrast, we are designing systems to use end-to-end encryption, which secures data all the way from one user's device to another user's device. In this approach, servers only ever handle encrypted data that they cannot decrypt. Thus, even if communication networks or servers are compromised, the confidentiality and integrity of sensitive data are protected, giving users better ownership and control over their data.

  • Making servers optional.

    At present, services typically transmit all data via a central server. Even if the communicating devices are in the same room, their data might be sent via a server on another continent. This approach is not only slow and wasteful, it also makes the system susceptible to disruption: if the server is blocked or subjected to a cyberattack (e.g. a DDoS attack), or if the operator goes out of business, the software stops working.

    To improve the resilience of applications, we are using peer-to-peer communication where possible, sending data directly between collaborating devices, and utilising fast local networks when applicable. Servers may still be used, but the software continues working if servers are unreachable. Using local storage and local networks further improves users' control over their own data.

  • From formal verification to practical implementation.

    Our work spans the gamut from theory to practice. On the theoretical end of the spectrum, we have produced formal proofs of correctness for our key algorithms, some of them mechanised using the Isabelle/HOL proof assistant. On the practical end, we are aiming for implementations that are stable, well-designed, well-tested, and performant, and thus usable by real production applications.

  • Open source and open standards.

    All software developed in this project is made freely available as open source under a permissive license. As our data formats and network protocols become stable, we aim to develop them into open standards, enabling interoperability between different implementations.

We have implemented this approach in Automerge, a JavaScript library for building collaborative applications. Automerge allows users to read and modify data even while their device is offline, and it performs data synchronisation and automatic conflict resolution when a network connection is available. Unlike most existing data synchronisation systems, Automerge does not require data to be sent via a centralised server, but rather allows local and peer-to-peer networks to be used, and it is compatible with end-to-end encryption protocols.

Open Source Software

Most of the software being developed as part of this project is open source. It is available under the trvedata and automerge organizations on GitHub, and the TRVE Data group on the Cambridge GitLab. Interesting repositories include:

The Ink&Switch research lab has also developed several open source prototype applications based on Automerge:

 

Publications
  1. Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan. Local-first software: You own your data, in spite of the cloud. ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! ’19), October 2019. doi:10.1145/3359591.3359737
  2. Stephan A. Kollmann, Martin Kleppmann, and Alastair R. Beresford. Snapdoc: Authenticated snapshots with history privacy in peer-to-peer collaborative editing. Proceedings on Privacy Enhancing Technologies (PoPETS), Vol. 2019, Issue 3, July 2019. doi:10.2478/popets-2019-0044
  3. Matthew Weidner. Group messaging for secure asynchronous collaboration. MPhil Dissertation, University of Cambridge, June 2019. Supervisors: Alastair R. Beresford and Martin Kleppmann.
  4. Martin Kleppmann, Alastair R. Beresford, and Boerge Svingen. Online Event Processing: Achieving consistency where distributed transactions have failed. Communications of the ACM, Volume 62, Issue 5, pages 43-49, May 2019. doi:10.1145/3312527
  5. Diana A. Vasile, Martin Kleppmann, Daniel R. Thomas, and Alastair R. Beresford. Ghost trace on the wire? Using key evidence for informed decisions. 27th International Workshop on Security Protocols, April 2019.
  6. Martin Kleppmann, Victor B. F. Gomes, Dominic P. Mulligan, and Alastair R. Beresford. Interleaving anomalies in collaborative text editors. 6th Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC), March 2019. doi:10.1145/3301419.3323972
  7. Martin Kleppmann, Victor B. F. Gomes, Dominic P. Mulligan, and Alastair R. Beresford. OpSets: Sequential Specifications for Replicated Datatypes. arXiv:1805.04263 [cs.DC], May 2018.
  8. Martin Kleppmann, Victor B. F. Gomes, Dominic P. Mulligan, and Alastair R. Beresford. OpSets: Sequential Specifications for Replicated Datatypes (Proof Document). Isabelle Archive of Formal Proofs, May 2018.
  9. Martin Kleppmann, Stephan A. Kollmann, Diana A. Vasile, and Alastair R. Beresford. From Secure Messaging to Secure Collaboration. 26th International Workshop on Security Protocols, March 2018. doi:10.1007/978-3-030-03251-7_21
  10. Victor B. F. Gomes, Martin Kleppmann, Dominic P. Mulligan, and Alastair R. Beresford. Verifying Strong Eventual Consistency in Distributed Systems. Proceedings of the ACM on Programming Languages (PACMPL), Vol. 1, OOPSLA, Article 109, October 2017. doi:10.1145/3133933
  11. Victor B. F. Gomes, Martin Kleppmann, Dominic P. Mulligan, and Alastair R. Beresford. A framework for establishing Strong Eventual Consistency for Conflict-free Replicated Datatypes. Isabelle Archive of Formal Proofs, July 2017.
  12. Stephan A. Kollmann and Alastair R. Beresford. The Cost of Push Notifications for Smartphones using Tor Hidden Services. Workshop on Innovations in Mobile Privacy and Security (IMPS), April 2017. doi:10.1109/EuroSPW.2017.55
  13. Martin Kleppmann and Alastair R. Beresford. A Conflict-free Replicated JSON Datatype. IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 10, Pages 2733–2746, April 2017. doi:10.1109/TPDS.2017.2697382
  14. Martin Kleppmann and Jay Kreps. Kafka, Samza and the Unix Philosophy of Distributed Data. IEEE Data Engineering Bulletin, Volume 38, Issue 4, Pages 4–14, December 2015.
  15. Martin Kleppmann and Conrad Irwin. Strengthening Public Key Authentication against Key Theft. 9th International Conference on Passwords, December 2015. doi:10.1007/978-3-319-29938-9_9
People


Alastair Beresford

Martin Kleppmann

Stephan Kollmann

Diana Vasile

Victor Gomes

Dominic Mulligan

Matthew Weidner

Daniel Hugenroth

Funding

We gratefully acknowledge the support of the Leverhulme Trust, the Isaac Newton Trust, The Boeing Company, Nokia Bell Labs through the Centre for Mobile, Wearable Systems and Augmented Intelligence, EPSRC, and Microsoft Research Cambridge. We also collaborate with the Ink&Switch research lab.