BO-MUSE: A HUMAN EXPERT AND AI TEAMING FRAMEWORK FOR ACCELERATED EXPERIMENTAL DE-SIGN

Abstract

In this paper we introduce BO-Muse, a new approach to human-AI teaming for the optimisation of expensive blackbox functions. Inspired by the intrinsic difficulty of extracting expert knowledge and distilling it back into AI models and by observations of human behaviour in real-world experimental design, our algorithm lets the human expert take the lead in the experimental process. The human expert can use their domain expertise to its full potential, while the AI plays the role of a muse, injecting novelty and searching for areas of weakness to break the human out of over-exploitation induced by cognitive entrenchment. With mild assumptions, we show that our algorithm converges sub-linearly, at a rate faster than the AI or human alone. We validate our algorithm using synthetic data and with human experts performing real-world experiments.

1. INTRODUCTION

Bayesian Optimisation (BO) (Shahriari et al., 2015) is a popular sample-efficient optimisation technique to solve problems where the objective is expensive. It has been applied successfully in diverse areas (Greenhill et al., 2020) including material discovery (Li et al., 2017) , alloy design (Barnett et al., 2020) and molecular design (Gómez-Bombarelli et al., 2018) . However, standard BO typically operates tabula rasa, building its model of the objective from minimal priors that do not include domain-specific detail. While there has been some progress made incorporating domain-specific knowledge to accelerate BO (Li et al., 2018; Hvarfner et al., 2022) or transfer learnings from previous experiments (Shilton et al., 2017) , it remains the case that there is a significant corpus of knowledge and expertise that could potentially accelerate BO even further but which remain largely untapped due to the inherent complexities involved in knowledge extraction and exploitation. In particular, this often arises from the fact that experts tend to organise their knowledge in complex schema containing concepts, attributes and relationships (Rousseau, 2001) , making the elicitation of relevant expert knowledge, both quantitative and qualitative, a difficult task. Experimental design underpins the discovery of new materials, processes and products. However, experiments are costly, the target function is unknown and the search space unclear. To be sample-efficient, the least number of experiments must be performed. Traditionally experimental design is guided by (human) experts who uses their domain expertise and intuition to formulate an experimental design, test it, and iterate based on observations. Living beings from fungi (Watkinson et al., 2005) to ants (Pratt & Sumpter, 2006) and humans (Daw et al., 2006; Cohen et al., 2007) face a dilemma when they make these decisions: exploit the information they have, or explore to gather new information. How humans balance this dilemma has been studied in Daw et al. (2006) -examining human choices in a n-arm bandit problem, they showed that humans were highly skewed towards exploitation. Moreover, when the task requires specialised experts, cognitive entrenchment is heightened and the balance between expertise and flexibility swings further towards remaining in known paradigms. To break out of this, dynamic environments of engagement are needed to force experts to incorporate new points of view (Dane, 2010) . For such lateral thinking to catalyse creativity, Beaney (2005) has further confirmed that random stimuli are crucial. For example, using random stimuli to boost creativity has been attempted in the context of games (Yannakakis et al., 2014) . In Sentient Sketchbook, a machine creates sketches that the human can refine, and sketches are readily created through machine learning models trained on ample data. Other approaches use machine representations to learn models of human knowledge, narrowing down options for the human to consider. Recently, Vasylenko et al. ( 2021) constructs a variational auto-encoder from underlying patterns of chemistry based on structure/composition and then a human generated hypothesis guides possible solutions. An entirely different approach refines a target function by allowing machine learning to discover relations between mathematical objects, and guides humans to make new conjectures (Davies et al., 2021) . Note, however, that there is still a requirement for large datasets to formulate representations of mathematical objects, which is antithetical to sample-efficiency as typically in experimental design we have a budget on the number of experiments, data from past designs is lean, and formulation of hypothesis is difficult in this lean data space. The use of BO for experimental design overcomes the problems of over-exploitation and cognitive entrenchment and provides mathematically rigorous guarantees of convergence to the optimal design. However, as noted previously, this often means that domain-specific knowledge and expertise is lost. In this paper, motivated by our observations, rather than attempting to enrich AI models using expert knowledge to accelerate BO, we propose the BO-Muse algorithm that lets the human expert take the lead in experimental design with the aid of an AI "muse" whose job is to augment the expert's intuition through AI suggestions. Thus the AI's role is to provide dynamism to break an expert's cognitive entrenchment and go beyond the state-of-the-art in new problems, while the expert's role is to harness their vast knowledge and extensive experience to produce state-of-the-art designs. Combining these roles in a formal framework is the main contribution of this paper. BO-Muse is a formal framework that inserts BO into the expert's workflow (see Figure 1 ), allowing adjustment of the AI exploit/explore strategy in response to the human expert suggestions. This process results in a batch of suggestions from the human expert and the AI at each iteration. This batch of designs is experimentally evaluated and shared with the human and the AI. The AI model is updated and the process iterated until the target is reached. We analyse the sample-efficiency of BO-Muse and provide a sub-linear regret bound. We validate BO-Muse using optimisation benchmarks and teaming with experts to perform complex real-world tasks. Our contributions are: • Design of a framework (BO-Muse) for a human expert and an AI to work in concert to accelerate experimental design, taking advantage of the human's deeper insight into the problem and the AI's advantage in using rigorous models to complement the expert to achieve sample-efficiency; • Design of an algorithm that compensates for the human tendency to be overly exploitative by appropriately boosting the AI exploration; • Provide a sub-linear regret bound for BO-Muse to demonstrate the accelerated convergence due to the human-AI teaming in the optimisation process; and • Provide experimental validation both using optimisation benchmark functions and with human experts to perform complex real-world tasks.

2.1. HUMAN MACHINE PARTNERSHIPS

Mixed initiative creative interfaces propose a tight coupling of human and machine to foster creativity. Thus far, however, research has been largely restricted to game design (Deterding et al., 2017) , where the authors identified open challenges including "what kinds of human-AI co-creativity can we envision across and beyond creative practice?". Our work is the first example of the use such a paradigm to accelerate experimental design. Also of importance, though beyond the scope of this study, is the design of interfaces for such systems (Rezwana & Maher, 2022) and how the differing ways human and machine express confidence affects performance (Steyvers et al., 2022) .



Figure 1: Bo-Muse Workflow.



