Visual data science for investigating a data center
The operator of a data center, or any other large network, would like a represention of their network and the tools to investigate it: this covers live anomaly alerting, forensic troubleshooting when something goes wrong, root cause analysis to understand the “disease” that causes the symptoms, and business intelligence to analyse how well existing policies are working. All of these tools should be accessible to operators and analysts, who might not be experts in data science. The unique feature of a data center (going beyond Dr Wischik’s existing research program in interactive visual data science) is that it operates at many levels: links and routers, packets, flows, jobs, applications, customers: and it is possible to obtain telemetry and logs that link together all these levels in the data hierarchy. This PhD project will develop a visual data science toolkit for dealing with explanations and analysis of hierarchical data. It will develop interesting use cases; build a testbed platform for integrated logging at all levels of the hierarchy; devise appropriate visualisations that work at multiple scales and multiple levels in the hierarchy; and investigate ways to interact with them. The goal is to assist an analyst to understand, by using machine learning to give as much automatic assistance as possible, and by emphasizing the analyst’s visualisation of the data and interaction with it.
Visual data science for understanding user behaviour
A user starts using a platform (driving route, behavioural incentive scheme, app, website), has some interactions, and stops using it. A different user has different experiences and stays hooked. Why? Is it because they had different expectations and needs? Or because of their experiences with the platform? What are the points at which a user can be persuaded to change their behaviour? On one hand, these are age-old questions that are fundamental to any sort of science (“find an equation that governs how this system evolves”). On the other hand, there are powerful new tools available (deep Kalman filtering, causality theory), and there are vastly bigger datasets (longer timespans, more fine-grained measurements, richer structures of measurements of user behaviour) to work with. This PhD project will develop a visual data science toolkit to allow analysts to manipulate and reason about the drivers of user behaviour. The goal is to help an analyst with Excel-level quantitative skills and deep domain knowledge, by making the data easy to manipulate, making the tools easy to apply, and making their outputs easy to visualise and to manipulate further. The distinctive feature of user behaviour modelling (going beyond Dr Wischik’s existing research program in interactive visual data science) is its emphasis on richly structured longitudinal data and causal modelling. It will investigate how an analyst might indicate the structure of “natural experiments” in the dataset, so that machine learning algorithms can infer causal behavioural models.