David Evans: Research

Increasingly, decisions of public policy depend on the provision of reliable data based on observations of the real world, captured by sensors measuring the activities of people or of the environment. Frequently, as a part of these policy initiatives people are advised to change their behaviour. Making substantial impact requires convincing people that their actions matter, which necessitates providing them with timely, pertinent, and accurate information so that they can make informed decisions. Consequently, behaviour must be monitored to gather data, algorithms that fuse these data into meaningful conclusions and make recommendations must be deployed, and systems need to disseminate these results to policy makers and the public. Doing this whilst assuaging privacy concerns is crucial for widespread voluntary cooperation. Over the next five years I intend to design systems that meet this goal, using policy to specify and control the flow of personal information, thereby curtailing dangerous exploitation.

The three operations—data collection through monitoring, data processing, and conclusion dissemination—each have implications for privacy. Monitoring to gather data means that detailed models of behaviour can be built. Frequently, the data collected will be from individuals, meaning that one can model what a particular person does; it may be easy or difficult to link models with individuals' identities. Processing by analysis algorithms may entail connecting parties having specialised abilities. Meaningful policies are needed to prescribe flows of personal information and doing this requires knowledge of how those parties will treat data that are potentially private and what can be inferred as a result. Finally, dissemination of the output from processing may allow exploitation of the individuals contributing data by those not involved in the system itself. For example, the pattern of electricity use by a house can be a high quality predictor of how many people are at home. The occupants of the house may be comfortable with the data showing this, as made apparent by the monitoring service's privacy policy, if these inferences are not available so as to plan burglaries or verify occupancy for the purposes of insurance or private investigation.

Addressing privacy within these three contexts requires using design principles that are, at present, poorly understood, coupled with concrete technology to help apply them. I intend to tackle both of these problems. My approach to enhancing the safety of monitoring, applied already within the TIME-EACM research project at Cambridge, starts with removing as much as is possible identity and location information from the data that are collected. The design strategy should be to collect such information only as policy requires, as opposed to protecting it with clumsy, error-prone access control mechanisms that may not prevent data leakage or breaches. Furthermore, conveying data from sensors to where they are processed may in itself reveal identity. I intend to apply techniques allowing anonymous communication to delivery of sensor data, examining issues such as providing data provenance if communication makes device identification impossible. This will mean understanding the new models of trust that lie between ignorance of identity (complete anonymity) and proof of identity with non-repudiation (via, for example, digital signatures). A sensing apparatus that functions correctly, responding as appropriate to authorised commands with data that are delivered only to the intended recipients, is best placed to offer a foundation for these services. Work on privacy thus complements sensor network security.

At Dalhousie University, my work focussed on designing an infrastructure enabling applications that improve the treatment of and quality of life for patients in the advanced stages of cognitive decline. Central to this infrastructure was middleware supporting distributed transactions in a manner that respects patient confidentiality, using formalised contracts to link permitted flows of data to processing actions. This is the sort of tool that is needed for data processing. I have explored similar expressions of policy within the TIME-EACM project with an emphasis on succinctness and on-line compliance checking, ensuring that data identify the privacy properties of the real-world phenomena that they represent. I intend to continue this work, formalising it by building on data flow modelling. This will lead to a mechanism allowing description of algorithms' data requirements, expression of the properties of results, and compositions leading to a description of data processors' behaviour. Together these form assertions of systems' properties with respect to privacy and can be checked for suitability against policy requirements.

Finally, in the area of dissemination of conclusions, access control can play a part. More broadly, applications should be designed with the idea of statistical disclosure control. This involves understanding the ways in which application output can be collated to make inference and the implications of those inferences on the individuals who provide data.

Concrete, specific technologies are needed to underpin these design principles. Some, such as a facility for anonymous communication and differential privacy, are implicit; others are centred on Information Flow Control (IFC) and means of conveying policy specification to software. To provide a coherent anchor for these I intend to fashion system sofware and middleware focussed on privacy-sensitive data collection and processing (robust dissemination appears to be feasible at the application level). Replacing software used by people on desktop computers and mobile phones is not realistic and is not a goal. However, systems supporting sensors and consequent data fusion are more amenable to change. The first step to doing this is to understand the most useful contributions from Decentralised IFC (as explored in projects such as Jif and Fabric), its incorporation into operating systems (Flume and Asbestos), conveyance of application requirements (following on from work on quality-of-service in, for example, Nemesis), and current work on infrastructure for federated sensors (FRESNEL). These results will be combined with my current work on data labelling, the effect of the physical world on policy, and multi-organisation policy interaction. The goal is to produce system software capable of managing distributed data collection through sensors in a way that is secure and respects the privacy intentions of the individuals being monitored. In so doing, assertions may be made about the resulting system's privacy properties, allowing formal policy reasoning to increase the safety and acceptability of systems that affect individuals.

I am uncomfortable advocating research results if I haven't proven their efficacy. Therefore I intend to build functioning research prototypes, in the tradition of systems research, augmented by formal analytic proofs of correctness where appropriate. I intend to aggressively recruit high quality post-graduate students and engage bright undergraduates for projects and research assistantships. The resulting research group will be well-positioned to have substantial impact, ensuring that systems necessary to provide the data that inform public policy and help people make lifestyle choices are embraced by those who might otherwise be worried by them.

Publications

Patent Application

Theses

Back
(David.Evans@cl.cam.ac.uk)