PRIOR PREFERENCE LEARNING FROM EXPERTS: DESIGNING A REWARD WITH ACTIVE INFERENCE Anonymous

Abstract

Active inference may be defined as Bayesian modeling of a brain with a biologically plausible model of the agent. Its primary idea relies on the free energy principle and the prior preference of the agent. An agent will choose an action that leads to its prior preference for a future observation. In this paper, we claim that active inference can be interpreted using reinforcement learning (RL) algorithms and find a theoretical connection between them. We extend the concept of expected free energy (EFE), which is a core quantity in active inference, and claim that EFE can be treated as a negative value function. Motivated by the concept of prior preference and a theoretical connection, we propose a simple but novel method for learning a prior preference from experts. This illustrates that the problem with inverse RL can be approached with a new perspective of active inference. Experimental results of prior preference learning show the possibility of active inference with EFE-based rewards and its application to an inverse RL problem.

1. INTRODUCTION

Active inference (Friston et al., 2009) is a theory emerging from cognitive science using a Bayesian modeling of the brain function (Friston et al., 2006; Friston, 2010; Friston et al., 2015; 2013) , predictive coding (Friston et al., 2011; Lopez-Persem et al., 2016) , and the free energy principle (Friston, 2012; Parr & Friston, 2019; Friston, 2019) . It states that the agents choose actions to minimize an expected future surprise (Friston et al., 2012; 2017a; b) , which is a measurement of the difference between an agent's prior preference and expected future. Minimization of an expected future surprise can be achieved by minimizing the expected free energy (EFE), which is a core quantity of active inference. Although active inference and EFE have been inspired and derived from cognitive science using a biologically plausible brain function model, its usage in RL tasks is still limited owing to its computational issues and prior-preference design. (Millidge, 2020; Fountas et al., 2020) First, EFE requires heavy computational cost. A precise computation of an EFE theoretically averages all possible policies, which is clearly intractable as an action space A and a time horizon T increase in size. Several attempts have been made to calculate the EFE in a tractable manner, such as limiting the future time horizon from t to t + H (Tschantz et al., 2019), and applying Monte-Carlo based sampling methods (Fountas et al., 2020; C ¸atal et al., 2020) for the search policies. Second, it is unclear how the prior preferences should be set. This is the same question as how to design the rewards in the RL algorithm. In recent studies (Fountas et al., 2020; C ¸atal et al., 2020; Ueltzhöffer, 2018 ) the agent's prior preference is simply set as the final goal of a given environment for every time step. There are some environments in which the prior preference can be set as time independent. However, most prior preferences in RL problems are neither simple nor easy to design because prior preferences of short and long-sighted futures should generally be treated in different ways. In this paper, we first claim that there is a theoretical connection between active inference and RL algorithms. We then propose prior preference learning (PPL), a simple and novel method for learning a prior-preference of an active inference from an expert simulation. In Section 2, we briefly introduce the concept of an active inference. From the previous definition of the EFE of a deterministic policy, in Section 3, we extend the previous concepts of active inference and theoretically demonstrate that it can be analyzed in view of the RL algorithm. We extend this quantity to a stochastic

