JOINT PERCEPTION AND CONTROL AS INFERENCE WITH AN OBJECT-BASED IMPLEMENTATION

Abstract

Existing model-based reinforcement learning methods often study perception modeling and decision making separately. We introduce joint Perception and Control as Inference (PCI), a general framework to combine perception and control for partially observable environments through Bayesian inference. Based on the fact that object-level inductive biases are critical in human perceptual learning and reasoning, we propose Object-based Perception Control (OPC), an instantiation of PCI which manages to facilitate control using automatic discovered object-based representations. We develop an unsupervised end-to-end solution and analyze the convergence of the perception model update. Experiments in a high-dimensional pixel environment demonstrate the learning effectiveness of our object-based perception control approach. Specifically, we show that OPC achieves good perceptual grouping quality and outperforms several strong baselines in accumulated rewards.

1. INTRODUCTION

Human-like computing, which aims at endowing machines with human-like perceptual, reasoning and learning abilities, has recently drawn considerable attention (Lake, 2014; Lake et al., 2015; Baker et al., 2017) . In order to operate within a dynamic environment while preserving homeostasis (Kauffman, 1993) , humans maintain an internal model to learn new concepts efficiently from a few examples (Friston, 2005) . The idea has since inspired many model-based reinforcement learning (MBRL) approaches to learn a concise perception model of the world (Kaelbling et al., 1998) . MBRL agents then use the perceptual model to choose effective actions. However, most existing MBRL methods separate perception modeling and decision making, leaving the potential connection between the objectives of these processes unexplored. A notable work by Hafner et al. (2020) provides a unified framework for perception and control. Built upon a general principle this framework covers a wide range of objectives in the fields of representation learning and reinforcement learning. However, they omit the discussion on combining perception and control for partially observable Markov decision processes (POMDPs), which formalizes many real-world decision-making problems. In this paper, therefore, we focus on the joint perception and control as inference for POMDPs and provide a specialized joint objective as well as a practical implementation. Many prior MBRL methods fail to facilitate common-sense physical reasoning (Battaglia et al., 2013) , which is typically achieved by utilizing object-level inductive biases, e.g., the prior over observed objects' properties, such as the type, amount, and locations. In contrast, humans can obtain these inductive biases through interacting with the environment and receiving feedback throughout their lifetimes (Spelke et al., 1992) , leading to a unified hierarchical and behavioral-correlated perception model to perceive events and objects from the environment (Lee and Mumford, 2003) . Before taking actions, a human agent can use this model to decompose a complex visual scene into distinct parts, understand relations between them, reason about their dynamics and predict the consequences of its actions (Battaglia et al., 2013) . Therefore, equipping MBRL with object-level inductive biases is essential to create agents capable of emulating human perceptual learning and reasoning and thus complex decision making (Lake et al., 2015) . We propose to train an agent in a similar way to gain inductive biases by learning the structured properties of the environment. This can enable the agent to plan like a human using its ability to think ahead, see what would happen for a range of possible choices, and make rapid decisions while learning a policy with the help of the inductive bias (Lake et al., 2017) . Moreover, in order to mimic a human's spontaneous acquisition of inductive biases 1

