IMPROVING THE STRENGTH OF HUMAN-LIKE MOD-ELS IN CHESS

Abstract

Designing AI systems that capture human-like behavior has attracted growing attention in applications where humans may want to learn from, or need to collaborate with, these AI systems. Many existing works in designing human-like AI have taken a supervised learning approach that learns from data of human behavior, with the goal of creating models that can accurately predict human behavior. While this approach has shown success in capturing human behavior at different skill levels and even identifying individual behavioral styles, it also suffers from the drawback of mimicking human mistakes. Moreover, existing models only capture a snapshot of human behavior, leaving the question of how to improve them-e.g., from one human skill level to a stronger one-largely unanswered. Using chess as an experimental domain, we investigate the question of teaching an existing human-like model to be stronger using a data-efficient curriculum, while maintaining the model's human similarity. To achieve this goal, we extend the concept of curriculum learning to settings with multiple labeling strategies, allowing us to vary both the curriculum (dataset) and the teacher (labeling strategy). We find that the choice of teacher has a strong impact on both playing strength and human similarity; for example, a teacher that is too strong can be less effective at improving playing strength and degrade human similarity more rapidly. We also find that the choice of curriculum can impact these metrics, but to a smaller extent; for example, training on a curriculum of human mistakes provides only a marginal benefit over training on a random curriculum. Finally, we show that our strengthened models achieve human similarity on datasets corresponding to their strengthened level of play, suggesting that our curriculum training methodology is improving them in human-like steps.

1. INTRODUCTION

AI systems are growing increasingly capable at solving tasks, making decisions, and assisting humans in a wide number of domains. In games such as Chess, Go, and Poker, AI has demonstrated clear superiority over human performance (Silver et al., 2018; Brown & Sandholm, 2019) . In domains such as marketing, transportation, medicine, law, hiring, and finance, AI decision making is capable enough to be deployed alongside humans in many real-life scenarios, and in some cases, AI performance is sufficient to take over entirely. More recently, researchers have looked into developing AI with the explicit goal of replicating human decision making, in contrast to simply optimizing for AI performance. There are several reasons for doing this. In domains such as autonomous driving, developing human-like AI can result in increased safety for both humans and AI, by providing an understanding of human driving tendencies (Hecker et al., 2019) . In situations where AI is assisting or educating humans, humans exhibit a higher level of trust when the AI is human-like (Wang et al., 2019; Li et al., 2021; Kim et al., 2022) , and higher levels of satisfaction when interacting with human-like AI (Amigó et al., 2006; Ragot et al., 2020; Jenneboer et al., 2022) . In gaming, human-like AI can offer opportunities to practice against models of real opponents (McIlroy-Young et al., 2022b; a) , and is generally more enjoyable to play with in competitive or cooperative play (Zhang et al., 2021; Soni & Hingston, 2008) . Modern efforts to create human-like AI are improving rapidly, but these systems naturally lead to reduced task performance as a result of focusing on human similarity. For example in chess, 1

