FACTORIZING DECLARATIVE AND PROCEDURAL KNOWLEDGE IN STRUCTURED, DYNAMICAL ENVI-RONMENTS

Abstract

Modeling a structured, dynamic environment like a video game requires keeping track of the objects and their states (declarative knowledge) as well as predicting how objects behave (procedural knowledge). Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly, i.e., they lack systematicity. For example, in a video game, correct prediction of one enemy's trajectory does not ensure correct prediction of another's. We address this issue via an architecture that factorizes declarative and procedural knowledge and that imposes modularity within each form of knowledge. The architecture consists of active modules called object files that maintain the state of a single object and invoke passive external knowledge sources called schemata that prescribe state updates. To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (e.g., health, position). We propose to use attention to determine which object files to update, the selection of schemata, and the propagation of information between object files. The resulting architecture is a drop-in replacement conforming to the same input-output interface as normal recurrent networks (e.g., LSTM, GRU) yet achieves substantially better generalization on environments that have multiple object tokens of the same type, including a challenging intuitive physics benchmark.

1. INTRODUCTION

An intelligent agent that interacts with its world must not only perceive objects but must also remember its past experience with these objects. The wicker chair in one's living room is not just a chair, it is the chair which has an unsteady leg and easily tips. Your keys may not be visible, but you recall placing them on the ledge by the door. The annoying fly buzzing in your left ear is the same fly you saw earlier which landed on the table. Visual cognition requires a short-term memory that keeps track of an object's location, properties, and history. In the cognitive science literature, this particular form of state memory is often referred to as an object file (Kahneman et al., 1992) , which we'll abbreviate as OF. An OF serves as a temporally persistent reference to an external object, permitting object constancy and permanence as the object and the viewer move in the world. Complementary to information in the OF is abstract knowledge about the dynamics and behavior of an object. We refer to this latter type of knowledge as a schema (plural schemata), another term borrowed from the cognitive-science literature. The combination of OFs and schemata is sufficient to predict future states of object-structured environments, critical for planning and goal-seeking behavior. To model a complex, structured visual environment, multiple OFs must be maintained in parallel. Consider scenes like a PacMan video-game screen in which the ghosts chase the PacMan, a public Each ghost is represented by a persistent OF (maintaining its location and velocity), but all ghosts operate according to one of two schemata, depending on whether the ghost is in a normal or scared state. square or sports field in which people interact with one another, or a pool table with rolling and colliding balls. In each of these environments, multiple instances of the same object class are present; all operate according to fundamentally similar dynamics. To ensure systematic modeling of the environment, the same dynamics must be applied to multiple object instances. Toward this goal, we propose a method of separately representing the state of an individual object-via an OF-and how its state evolves over time-via a schema. Object-oriented programming (OOP) provides a metaphor for thinking about the relationship between OFs and schemata. In OOP, each object is an instantiation of an object class and it has a self-contained collection of variables whose values are specific to that object and methods that operate on all instances of the same class. The relation between objects and methods mirrors the relationship between our OFs and schemata. In both OOP and our view of visual cognition, a key principle is the encapsulation of knowledge: internal details of objects (OFs) are hidden from other objects (OFs), and methods (schemata) are accessible to all and only objects (OFs) to which they are applicable. The modularity of knowledge in OOP supports human programmers in writing code that is readily debugged, extended, and reused. We conjecture that the corresponding modularity of OFs and schemata will lead to neural-network models with more efficient learning and more robust generalization, thanks to appropriate disentangling and separation of concerns. Modularity is the guiding principle of the model we propose, which we call SCOFF, an acronym for schema / object-file factorization. Like other neural net models with external memory (e.g., Mozer and Das, 1993; Graves et al., 2016; Sukhbaatar et al., 2015) , SCOFF includes a set of slots which are each designed to contain an OF (Figure 2 ). In contrast to most previous external memory models, the slots are not passive contents waiting to be read or written by an active process, but are dynamic, modular elements that seek information in the environment that is relevant to the object they represent, and when critical information is observed, they update their states, possibly via information provided by other OFs. Event-based OOP is a good metaphor for this active process, where external events can trigger the action of objects. As Figure 2 suggests, there is a factorization of declarative knowledge-the location, properties, and history of an object, as contained in the OFs-and procedural knowledge-the rules of object behavior, as contained in the schemata. Whereas declarative knowledge can change rapidly, procedural knowledge is more stable over time. This factorization allows any schema to be applied to any OF as



Mila, University of Montreal, IIT BHU, Varanasi, Waverly, UC Berkeley, Deepmind, Google Research, Brain Team, Corresponding author: anirudhgoyal9119@gmail.com



Figure1: Two successive frames of PacMan, illustrating the factorization of knowledge. Each ghost is represented by a persistent OF (maintaining its location and velocity), but all ghosts operate according to one of two schemata, depending on whether the ghost is in a normal or scared state.

annex

Schemata are sets of parameters that specify the dynamics of objects. Object files (OFs) are active modules that maintain the timevarying state of an object, seek information from the input, and select schemata for updating, and transmit information to other object files. Through spatial attention, OFs compete and select different regions of the input.

