This project is about using a trusted execution environment (in this case, Intel's SGX, although one could also look at Arm's TrustZone) to protect code&data running a confidential computation, e.g. learning some statistics from personal healthcare or finanical records.
The idea starts from work on secure containers and confidential map/reduce style computational frameworks (Hadoop, Spark etc), but in this instance, the task is to take a different framekwor, namely dataflow, built in a different language (not java/scala, but Rust).
SCONE: Secure Linux Containers with Intel SGX
VC3: Trustworthy Data Analytics in the Cloud using SGX
Timely Dataflow and see also differential dataflow and sgx - needs rust runtime in sgx, which is done:
Maru: Spark in sgx, is basically similar to hadoop (vc3) but need jvm (scala) in the sgx enclave, so build like scone, using sgx lkl...SGX LKL is a linux kernel library (ie.. linux ported to run as a library ratehr than a monolitic OS) and then run inside of the SGX environment.
One of the challenges is to cope with SGX limited memory (96Mbyte). Another is how to do input/output (files and networking) securely.
The Maru project.....see above...has ported Spark to run in SGX, but only some of the core primitives - the basic ability to run RDDs and apply functions. What this project would do is to take the more complex structuring tools from Spark, ie.. Dataframes for Spark in SGX
Private data on public cloud...see above
Resilient distributed dataset is basic building block, but lots more features in spark, including:
Data frames...add yours here...see above under project 2 for more links.
Immutable labels \& information flow control enforcement in SGX (and CHERI)
So completely clean slate - take Thomas Pasquier's work on IFC and immutable labels in kernel, and move enforcement from linux kernel to SGX enclave/TEE
Core starting point: Towards practical information flow control and audit