FACTORIZING DECLARATIVE AND PROCEDURAL KNOWLEDGE IN STRUCTURED, DYNAMICAL ENVI-RONMENTS

Abstract

Modeling a structured, dynamic environment like a video game requires keeping track of the objects and their states (declarative knowledge) as well as predicting how objects behave (procedural knowledge). Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly, i.e., they lack systematicity. For example, in a video game, correct prediction of one enemy's trajectory does not ensure correct prediction of another's. We address this issue via an architecture that factorizes declarative and procedural knowledge and that imposes modularity within each form of knowledge. The architecture consists of active modules called object files that maintain the state of a single object and invoke passive external knowledge sources called schemata that prescribe state updates. To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (e.g., health, position). We propose to use attention to determine which object files to update, the selection of schemata, and the propagation of information between object files. The resulting architecture is a drop-in replacement conforming to the same input-output interface as normal recurrent networks (e.g., LSTM, GRU) yet achieves substantially better generalization on environments that have multiple object tokens of the same type, including a challenging intuitive physics benchmark.

1. INTRODUCTION

An intelligent agent that interacts with its world must not only perceive objects but must also remember its past experience with these objects. The wicker chair in one's living room is not just a chair, it is the chair which has an unsteady leg and easily tips. Your keys may not be visible, but you recall placing them on the ledge by the door. The annoying fly buzzing in your left ear is the same fly you saw earlier which landed on the table. Visual cognition requires a short-term memory that keeps track of an object's location, properties, and history. In the cognitive science literature, this particular form of state memory is often referred to as an object file (Kahneman et al., 1992) , which we'll abbreviate as OF. An OF serves as a temporally persistent reference to an external object, permitting object constancy and permanence as the object and the viewer move in the world. Complementary to information in the OF is abstract knowledge about the dynamics and behavior of an object. We refer to this latter type of knowledge as a schema (plural schemata), another term borrowed from the cognitive-science literature. The combination of OFs and schemata is sufficient to predict future states of object-structured environments, critical for planning and goal-seeking behavior. To model a complex, structured visual environment, multiple OFs must be maintained in parallel. Consider scenes like a PacMan video-game screen in which the ghosts chase the PacMan, a public



Mila, University of Montreal, IIT BHU, Varanasi, Waverly, UC Berkeley, Deepmind, Google Research, Brain Team, Corresponding author: anirudhgoyal9119@gmail.com

