LEARNING HUMAN-COMPATIBLE REPRESENTATIONS FOR CASE-BASED DECISION SUPPORT

Abstract

Algorithmic case-based decision support provides examples to aid people in decision making tasks by providing contexts for a test case. Despite the promising performance of supervised learning, representations learned by supervised models may not align well with human intuitions: what models consider similar examples can be perceived as distinct by humans. As a result, they have limited effectiveness in case-based decision support. In this work, we incorporate ideas from metric learning with supervised learning to examine the importance of alignment for effective decision support. In addition to instance-level labels, we use human-provided triplet judgments to learn human-compatible decision-focused representations. Using both synthetic data and human subject experiments in multiple classification tasks, we demonstrate that such representation is better aligned with human perception than representation solely optimized for classification. Human-compatible representations identify nearest neighbors that are perceived as more similar by humans and allow humans to make more accurate predictions, leading to substantial improvements in human decision accuracies (17.8% in butterfly vs. moth classification and 13.2% in pneumonia classification).

1. INTRODUCTION

Despite the impressive performance of machine learning (ML) models, humans are often the final decision maker in high-stake domains due to ethical and legal concerns (Lai & Tan, 2019; Green & Chen, 2019) , so ML models as decision support is preferred over full automation. In order to provide meaningful information to human decision makers, the model cannot be illiterate in the underlying problem, e.g., a model for assisting breast cancer radiologists should have a high diagnostic accuracy by itself. However, a model with high autonomous performance may not provide the most effective decision support, because it could solve the problem in a way that is not comprehensible or even perceptible to humans, e.g., AlphaGo's famous move 37 (Silver et al., 2016; 2017; Metz et al., 2016) . Our work studies the relation between these two objectives that effective decision support must balance: achieving high autonomous performance and aligning with human intuitions. We focus on case-based decision support for classification problems (Kolodneer, 1991; Begum et al., 2009; Liao, 2000; Lai & Tan, 2019) 



. For each test example, in addition to showing the model's predicted label, case-based decision support shows one or more related examples retrieved from the training set. These examples can be used to justify the model's prediction, e.g., by showing similar-looking examples with the predicted label, or to help human decision makers calibrate its uncertainty, e.g., by showing similar-looking examples from other classes. Both use cases require the model to know what is similiar-looking to the human decision maker. In other words, an important consideration in aligning with human intuition is approximating human judgment of similarity.

Figure1illustrates the importance of such alignment on a classification problem of distinguishing butterfly from moth. A high-accuracy ResNet(He et al., 2016)  produces a highly linearly-separable representation space, which leads to high classification accuracy. But the nearest neighbor cannot provide effective justification for model prediction because it looks dissimilar to the test example for humans. The similarity measured in model representation space does not align with human visual similarity. If we instead use representations from a second model trained specifically to mimic human visual similarity rather than to classify images, the nearest neighbor would provide strong

