EXPLAINING BY IMITATING: UNDERSTANDING DECISIONS BY INTERPRETABLE POLICY LEARNING

Abstract

Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging-with no access to underlying states, no knowledge of environment dynamics, and no allowance for live experimentation. We desire learning a data-driven representation of decisionmaking behavior that (1) inheres transparency by design, (2) accommodates partial observability, and (3) operates completely offline. To satisfy these key criteria, we propose a novel model-based Bayesian method for interpretable policy learning ("INTERPOLE") that jointly estimates an agent's (possibly biased) belief-update process together with their (possibly suboptimal) belief-action mapping. Through experiments on both simulated and real-world data for the problem of Alzheimer's disease diagnosis, we illustrate the potential of our approach as an investigative device for auditing, quantifying, and understanding human decision-making behavior.

1. INTRODUCTION

A principal challenge in modeling human behavior is in obtaining a transparent understanding of decision-making. In medical diagnosis, for instance, there is often significant regional and institutional variation in clinical practice [1] , much of it the leading cause of rising healthcare costs [2] . The ability to quantify different decision processes is the first step towards a more systematic understanding of medical practice. Purely by observing demonstrated behavior, our principal objective is to answer the question: Under any given state of affairs, what actions are (more/less) likely to be taken, and why? We address this challenge by setting our sights on three key criteria. First, we desire a method that is transparent by design. Specifically, a transparent description of behavior should locate the factors that contribute to individual decisions, in a language readily understood by domain experts [3, 4] . This will be clearer per our subsequent formalism, but we can already note some contrasts: Classical imitation learning-popularly by reduction to supervised classification-does not fit the bill, since black-box hidden states of RNNs are rarely amenable to meaningful interpretation. Similarly, apprenticeship learning algorithms-popularly through inverse reinforcement learning-do not satisfy either, since the high-level nature of reward mappings is not informative as to individual actions observed in the data. Rather than focusing purely on replicating actions (imitation learning) or on matching expert performance (apprenticeship learning), our chief pursuit lies in understanding demonstrated behavior. Second, real-world environments such as healthcare are often partially observable in nature. This requires modeling the accumulation of information from entire sequences of past observations-an endeavor that is prima facie at odds with the goal of transparency. For instance, in a fully-observable setting, (model-free) behavioral cloning is arguably 'transparent' in providing simple mappings of states to actions; however, coping with partial observability using any form of recurrent function (2) partial observability, and (3) offline learning, and makes no assumptions w.r.t. unbiasedness of beliefs or optimality of policies. Observations, beliefs, (optimal) q-values, actions, and policies are denoted z, b, q * , a, and π; bold denotes learned quantities, italics are known (or queryable), and " †" denotes jointly-learned quantities. No assumption No assumption approximation immediately lands in black-box territory. Likewise, while (model-based) methods have been developed for robotic control, their transparency crucially hinges on fully-observable kinematics. Finally, in realistic settings it is often impossible to experiment online-especially in high-stakes environments with real products and patients. The vast majority of recent work in (inverse) reinforcement learning has focused on games, simulations, and gym environments where access to live interaction is unrestricted. By contrast, in healthcare settings the environment dynamics are neither known a priori, nor estimable by repeated exploration. We want a data-driven representation of behavior that is learnable in a completely offline fashion, yet does not rely on knowing/modeling any true dynamics. Contributions Our contributions are three-fold. First, we propose a model for interpretable policy learning ("INTERPOLE")-where sequential observations are aggregated through a decision agent's decision dynamics (viz. subjective belief-update process), and sequential actions are determined by the agent's decision boundaries (viz. probabilistic belief-action mapping). Second, we suggest a Bayesian learning algorithm for estimating the model, simultaneously satisfying the key criteria of transparency, partial observability, and offline learning. Third, through experiments on both simulated and real-world data for Alzheimer's disease diagnosis, we illustrate the potential of our method as an investigative device for auditing, quantifying, and understanding human decision-making behavior.

2. RELATED WORK

We seek to learn an interpretable parameterization of observed behavior to understand an agent's actions. Fundamentally, this contrasts with imitation learning (which seeks to best replicate demonstrated policies) and apprenticeship learning (which seeks to match some notion of performance). Specifically with an eye on explainability, Info-GAIL [22, 23] proposes an orthogonal notion of "interpretability" that hinges on clustering similar demonstrations to explain variations in behavior. However, as with GAIL it suffers from the need for live interaction for learning. Finally, several model-based techniques for imitation learning (MB-IL) have been studied in the domain of robotics. [24] consider kinematic models designed for robot dynamics, while [25] and [7] consider (non-)linear autoregressive exogenous models. However, such approaches invariably operate in fully-observable settings, and are restricted models hand-crafted for specific robotic applications under consideration.



Comparison with Related Work. INTERPOLE satisfies our key criteria of (1) transparency by design,

Imitation Learning In fully-observable settings, behavior cloning (BC) readily reduces the imitation problem to one of supervised classification[5,[11][12][13]; i.e. actions are simply regressed on observations. While this can be extended to account for partial observability by parameterizing policies via recurrent function approximation[14], it immediately gives up on ease of interpretability per the black-box nature of RNN hidden states. A plethora of model-free techniques have recently been developed, which account for information in the rollout dynamics of the environment during policy learning (see e.g.[15][16][17][18][19][20])-most famously, generative adversarial imitation learning (GAIL) based on statedistribution matching[6, 21]. However, such methods require repeated online rollouts of intermediate policies during training, and also face the same black-box problem as BC in partially observable settings. Clearly in model-free imitation, it is difficult to admit both transparency and partial observability.

