JOINT PERCEPTION AND CONTROL AS INFERENCE WITH AN OBJECT-BASED IMPLEMENTATION

Abstract

Existing model-based reinforcement learning methods often study perception modeling and decision making separately. We introduce joint Perception and Control as Inference (PCI), a general framework to combine perception and control for partially observable environments through Bayesian inference. Based on the fact that object-level inductive biases are critical in human perceptual learning and reasoning, we propose Object-based Perception Control (OPC), an instantiation of PCI which manages to facilitate control using automatic discovered object-based representations. We develop an unsupervised end-to-end solution and analyze the convergence of the perception model update. Experiments in a high-dimensional pixel environment demonstrate the learning effectiveness of our object-based perception control approach. Specifically, we show that OPC achieves good perceptual grouping quality and outperforms several strong baselines in accumulated rewards.

1. INTRODUCTION

Human-like computing, which aims at endowing machines with human-like perceptual, reasoning and learning abilities, has recently drawn considerable attention (Lake, 2014; Lake et al., 2015; Baker et al., 2017) . In order to operate within a dynamic environment while preserving homeostasis (Kauffman, 1993) , humans maintain an internal model to learn new concepts efficiently from a few examples (Friston, 2005) . The idea has since inspired many model-based reinforcement learning (MBRL) approaches to learn a concise perception model of the world (Kaelbling et al., 1998) . MBRL agents then use the perceptual model to choose effective actions. However, most existing MBRL methods separate perception modeling and decision making, leaving the potential connection between the objectives of these processes unexplored. A notable work by Hafner et al. (2020) provides a unified framework for perception and control. Built upon a general principle this framework covers a wide range of objectives in the fields of representation learning and reinforcement learning. However, they omit the discussion on combining perception and control for partially observable Markov decision processes (POMDPs), which formalizes many real-world decision-making problems. In this paper, therefore, we focus on the joint perception and control as inference for POMDPs and provide a specialized joint objective as well as a practical implementation. Many prior MBRL methods fail to facilitate common-sense physical reasoning (Battaglia et al., 2013) , which is typically achieved by utilizing object-level inductive biases, e.g., the prior over observed objects' properties, such as the type, amount, and locations. In contrast, humans can obtain these inductive biases through interacting with the environment and receiving feedback throughout their lifetimes (Spelke et al., 1992) , leading to a unified hierarchical and behavioral-correlated perception model to perceive events and objects from the environment (Lee and Mumford, 2003) . Before taking actions, a human agent can use this model to decompose a complex visual scene into distinct parts, understand relations between them, reason about their dynamics and predict the consequences of its actions (Battaglia et al., 2013) . Therefore, equipping MBRL with object-level inductive biases is essential to create agents capable of emulating human perceptual learning and reasoning and thus complex decision making (Lake et al., 2015) . We propose to train an agent in a similar way to gain inductive biases by learning the structured properties of the environment. This can enable the agent to plan like a human using its ability to think ahead, see what would happen for a range of possible choices, and make rapid decisions while learning a policy with the help of the inductive bias (Lake et al., 2017) . Moreover, in order to mimic a human's spontaneous acquisition of inductive biases In this paper, we introduce joint Perception and Control as Inference (PCI) as shown in Fig. o 1 s 1 a 1 o 2 a 2 …… s 2 x 1 x 2 q(s) p(s | x) p(x, o | s) p(s) o 1 s 1 a 1 o 2 a 2 …… θ 1 s 2 θ 2 x 1 x 2 q(s) p (s | x, o) p (x, o | s) p (s) (1), a unified framework for decision making and perception modeling to facilitate understanding of the environment while providing a joint objective for both the perception and the action choice. As we argue that inductive bias gained in object-based perception is beneficial for control tasks, we then propose Object-based Perception Control (OPC), an instantiation of PCI which facilitates control with the help of automatically discovered representations of objects from raw pixels. We consider a setting inspired by real-world scenarios; we consider a partially observable environment in which agents' observations consist of a visual scene with compositional structure. The perception optimization of OPC is typically achieved by inference in a spatial mixture model through generalized expectation maximization (Dempster et al., 1977) , while the policy optimization is derived from conventional temporal-difference (TD) learning (Sutton, 1988) . Proof of convergence for the perception model update is provided in Appendix A. We test OPC on the Pixel Waterworld environment. Our results show that OPC achieves good quality and consistent perceptual grouping and outperforms several strong baselines in terms of accumulated rewards.

2. RELATED WORK

Connecting Perception and Control Formulating RL as Bayesian inference over inputs and actions has been explored by recent works (Todorov, 2008; Kappen et al., 2009; Rawlik et al., 2010; Ortega and Braun, 2011; Levine, 2018; Tschiatschek et al., 2018; Lee et al., 2019b; a; Ortega et al., 2019; Xin et al., 2020; O'Donoghue et al., 2020) . The generalized free energy principle (Parr and Friston, 2019) studies a unified objective by heuristically defining entropy terms. A unified framework for perception and control from a general principle is proposed by Hafner et al. (2020) . Their framework provides a common foundation from which a wide range of objectives can be derived such as representation learning, information gain, empowerment, and skill discovery. However, one trade-off for the generality of their framework is the loss in precision. Environments in many real-world decision-making problems are only partially observable, which signifies the importance of MBRL methods to solving POMDPs. However, relevant and integrated discussion is omitted in Hafner et al. (2020) . In contrast, we focus on the joint perception and control as inference for POMDPs and provide a specialized joint-objective as well as a practical implementation. Model-based Deep Reinforcement Learning MBRL algorithms have been shown to be effective in various tasks (Gu et al., 2016) , including operating in environments with high-dimensional raw pixel observations (Igl et al., 2018; Shani et al., 2005; Watter et al., 2015; Levine et al., 2016; Finn and Levine, 2017) . Existing methods have considered incorporating reward structure into modellearning (Farahmand et al., 2017; Oh et al., 2017) , while our proposed PCI takes one step forward by incorporating the perception model into the control-as-inference derivation to yield a single unified objective for multiple components in a pipeline. One of the methods closely related to OPC is the World Model (Ha and Schmidhuber, 2018), which consists of offline and separately trained models for vision, memory, and control. These methods typically produce entangled latent representations for pixel observations whereas, for real world tasks such as reasoning and physical interaction, it is often necessary to identify and manipulate multiple entities and their relationships for optimal performance. Although Zambaldi et al. (2018) has used the relational mechanism to discover and reason about entities, their model needs additional supervision of location information.



Figure1: The graphical model of joint Perception and Control as Inference (PCI), where s and o represent the latent state and the binary optimality binary variable, respectively. The hierarchical perception model includes a bottom-up recognition model q(s) and a top-down generative model p(x, o, s) (decomposed into the likelihood p(x, o|s) and the prior belief p(s)). Control is performed by taking an action a to affect the environment state.

