EXPLAINING BY IMITATING: UNDERSTANDING DECISIONS BY INTERPRETABLE POLICY LEARNING

Abstract

Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging-with no access to underlying states, no knowledge of environment dynamics, and no allowance for live experimentation. We desire learning a data-driven representation of decisionmaking behavior that (1) inheres transparency by design, (2) accommodates partial observability, and (3) operates completely offline. To satisfy these key criteria, we propose a novel model-based Bayesian method for interpretable policy learning ("INTERPOLE") that jointly estimates an agent's (possibly biased) belief-update process together with their (possibly suboptimal) belief-action mapping. Through experiments on both simulated and real-world data for the problem of Alzheimer's disease diagnosis, we illustrate the potential of our approach as an investigative device for auditing, quantifying, and understanding human decision-making behavior.

1. INTRODUCTION

A principal challenge in modeling human behavior is in obtaining a transparent understanding of decision-making. In medical diagnosis, for instance, there is often significant regional and institutional variation in clinical practice [1] , much of it the leading cause of rising healthcare costs [2] . The ability to quantify different decision processes is the first step towards a more systematic understanding of medical practice. Purely by observing demonstrated behavior, our principal objective is to answer the question: Under any given state of affairs, what actions are (more/less) likely to be taken, and why? We address this challenge by setting our sights on three key criteria. First, we desire a method that is transparent by design. Specifically, a transparent description of behavior should locate the factors that contribute to individual decisions, in a language readily understood by domain experts [3, 4] . This will be clearer per our subsequent formalism, but we can already note some contrasts: Classical imitation learning-popularly by reduction to supervised classification-does not fit the bill, since black-box hidden states of RNNs are rarely amenable to meaningful interpretation. Similarly, apprenticeship learning algorithms-popularly through inverse reinforcement learning-do not satisfy either, since the high-level nature of reward mappings is not informative as to individual actions observed in the data. Rather than focusing purely on replicating actions (imitation learning) or on matching expert performance (apprenticeship learning), our chief pursuit lies in understanding demonstrated behavior. Second, real-world environments such as healthcare are often partially observable in nature. This requires modeling the accumulation of information from entire sequences of past observations-an endeavor that is prima facie at odds with the goal of transparency. For instance, in a fully-observable setting, (model-free) behavioral cloning is arguably 'transparent' in providing simple mappings of states to actions; however, coping with partial observability using any form of recurrent function * Authors contributed equally 1

