LEARNING TO REASON AND ACT IN CASCADING PROCESSES DRIVEN BY SEMANTIC EVENTS Anonymous authors Paper under double-blind review

Abstract

Training agents to control a dynamic environment is a fundamental task in AI. In many environments, the dynamics can be summarized by a small set of events that capture the semantic behavior of the system. Typically, these events form chains or cascades. We often wish to change the system behavior using a single intervention that propagates through the cascade. For instance, one may trigger a biochemical cascade to switch the state of a cell or reroute a truck in logistic chains to meet an unexpected, urgent delivery. We introduce a new supervised learning setup called Cascade. An agent observes a system with known dynamics evolving from some initial state. It is given a structured semantic instruction and needs to make an intervention that triggers a cascade of events, such that the system reaches an alternative (counterfactual) behavior. We provide a test-bed for this problem, consisting of physical objects. We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to efficiently search in exponentially large semantic trees of continuous spaces. We demonstrate that our approach learns to effectively follow instructions to intervene in new complex scenes. When provided with an observed cascade of events, it can also reason about alternative outcomes.

1. INTRODUCTION

Teaching agents to understand and control their dynamic environments is a fundamental problem in AI. It becomes extremely challenging when events trigger other events. We denote such processes as cascading processes. As an example, consider a set of chemical reactions in a cellular pathway. The synthesis of a new molecule is a discrete event that later enables other chemical reactions. Cascading processes are also prevalent in man-made systems: In assembly lines, when one task is completed, e.g., construction of gears, it may trigger another task, e.g. building the transmission system. Cascading processes are abundant in many environments, from natural processes like chemical reactions, through managing crisis situations for natural disasters (Zuccaro et al., 2018; Nakano et al., 2022) to logistic chains or water treatment plants (Cong et al., 2010) . A major goal with cascading processes is to intervene and steer them towards a desired goal. For example, in biochemical cascades, one hopes to control chemical cascades in a cell by providing chemical signals; in logistics, a cargo dispatch plan may be completely modified by assigning a cargo plane to a different location. This paper addresses the problem of reasoning about a cascading process and controlling its qualitative behavior. We describe a new counterfactual reasoning setup called "Cascade", which is trained via supervised learning. At inference time, an agent observes a dynamical system, evolving through a cascading process that was triggered from some initial state. We refer to it as the "unsatisfied" or "observed" cascade. The goal of the agent is to steer the system toward a different, counterfactual, configuration. That target configuration is given as a set of qualitative constraints about the end results and the intermediate properties of the cascade. We call these constraints the "instruction". To satisfy that instruction, the agent intervenes with the system at a specific point in time by changing the state of one element which we call the "pivot". To solve the Cascade learning problem, we train an agent to select an intervention given a state of a system and an instruction. Importantly, we operate in a counterfactual mode (See Pearl, 2000) . During training, the agent only sees scenarios that are "satisfied", in the sense that the system dynamics 1

