Sherif Akoush



After spending 9 wonderful years at the University of Cambridge Computer Lab, I will be leaving soon. You could read about my experiences at the Computer Lab here. My personal email is sherif [dot] akoush [at] gmail [dot] com.


I am a research associate at the University of Cambridge Computer Laboratory (Digital Technology Group), where I also did my PhD. I mainly work as part of Computing for the Future of the Planet research agenda, set by Prof. Andy Hopper. My research interests lie in energy efficient and trusted computing with some focus on developing regions.
Supervisor/PI: Prof. Andy Hopper

Previously, I have completed both my undergraduate and masters degrees in Computer Science at the American University in Cairo. I worked for 5 years in industry, implementing several location based services.


Big data provenance

Big data analytics deal with large volumes and diverse input data sources. As computation chains become longer, ad-hoc and machine-led, it is imperative that consumers of derived big data outputs can identify precisely input data sources used to produce the result. This data provenance feature can be used for various applications such as automatic data inspection, audit, and compliance with data sharing policies. Consequently users will be able to establish greater confidence in the result as well as comply with data legality requirements.

Towards this goal, we have built HadoopProv—a system that adds fine-grained provenance (lineage) support for Hadoop MapReduce. It imposes low runtime overhead (~10%) on the actual computation by separating provenance capture and query from the main execution path. Moreover computations run seamlessly because: (1) we modify the framework core to capture data provenance and (2) we automatically add instrumentations to the user job to track stateful data flow. Capturing fine-grained provenance enables HadoopProv to have precise relationships between input and output, which enables new applications not previously possible with existing approaches.

-Sherif Akoush, Ripduman Sohan and Andy Hopper. Recomputation-based Data Reliability for MapReduce using Lineage, Technical Report, UCAM-CL-TR-888, May 2016. [link]
-Sherif Akoush, Lucian Carata, Ripduman Sohan and Andy Hopper. MrLazy: Lazy Runtime Label Propagation for MapReduce, HOTCLOUD'14, Jun 2014. [pdf]
-Lucian Carata, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Seltzer and Andy Hopper. A Primer on Provenance, Communications of the ACM, May 2014. [pdf]
-Sherif Akoush, Ripduman Sohan and Andy Hopper. HadoopProv: Towards Provenance as a First Class Citizen in MapReduce, TAPP'13, Apr 2013. [pdf]

Internet usage in Africa

Lately there is interest in understanding Internet usage from developing regions such as Africa. Towards this goal we captured an anonymised trace from a cellular operator in Rwanda representing data traffic of 200,000 users for a week in February 2015. We highlight the key insights that we discovered focusing on device types, content being accessed and its geographical proximity to users.

-Sherif Akoush, Ahmed ElMezeini, Ripduman Sohan, Lucian Carata and Andy Hopper. Cellular Data Usage in Africa: a Case Study from Rwanda, under review. [pdf]
-Until the full report is produced, I have put together this video [youtube] showing the main insights from our data exploration and analysis.
-In this study we accurately geolocate destination/server IPs (geoip); This link explains how we do it.

Special thanks to for providing TAC mappings used in this study.

Renewable energy in datacentre computing

Datacentres are energy hungry consumers and by extension have a big impact on our environment. This researcg argues that datacentre computing can use renewable (clean) energy which would otherwise be lost and hence reduce its impact on the environment while utilising this “free” energy. The proposal is to locate datacentres near renewable energy sources interconnecting them using high-speed low-latency dedicated links and migrating computations and their associated data to where energy is currently available. In this way, the seamless execution of applications can be sustained despite power intermittency. There are a number of technical challenges inherent in this design which are addressed throughout this research.

-Sherif Akoush, Ripduman Sohan, Andrew Rice and Andy Hopper. Evaluating the Viability of Remote Renewable Energy in Datacentre Computing, Technical Report, UCAM-CL-TR-889, May 2016. [link]
-Sherif Akoush, Ripduman Sohan, Bogdan Roman, Andrew Rice and Andy Hopper. Activity Based Sector Synchronisation: Efficient Transfer of Disk-State for WAN Live Migration, MASCOTS'11, Jul 2011. [pdf]
-Sherif Akoush, Ripduman Sohan, Andrew Rice, Andrew W. Moore and Andy Hopper. Free Lunch: Exploiting Renewable Energy for Computing, HOTOS'XIII, May 2011. [pdf]
-Sherif Akoush, Ripduman Sohan, Andrew Rice, Andrew W. Moore and Andy Hopper. Predicting the Performance of Virtual Machine Migration, MASCOTS'10, Aug 2010. [pdf][Best Student Paper]
-Andrew Rice, Sherif Akoush and Andy Hopper. Failure is an option. Microsoft Research Technical Report MSR-TR-2008-61, 2008. [pdf]

Movement prediction

During my masters degree,I worked on predicting movement patterns for cellular networks users.
-Sherif Akoush and Ahmed Sameh. Mobile User Movement Prediction using Bayesian Learning for Neural Networks, ICWCMC'07, Aug 2007.

Talks and presentations

-Talk at HOTCLOUD 2011 about Fine-Grained Audit in MapReduce. [link]
-Talk at MASCOTS 2011 about the Efficient Transfer of Disk-State for WAN Live Migration. [pdf]
-Talk at HOTOS 2011 about Exploiting Renewable Energy for Computing. [pdf]
-MIT technologyreview: Really Remote Data, 2011. [link]
-2 minutes video about my PhD research (renewable energy in datacentres), 2008. [youtube]


Email: sherif [dot] akoush [at] gmail [dot] com
Last modified in July 2016.