LEARNING TO REASON AND ACT IN CASCADING PROCESSES DRIVEN BY SEMANTIC EVENTS Anonymous authors Paper under double-blind review

Abstract

Training agents to control a dynamic environment is a fundamental task in AI. In many environments, the dynamics can be summarized by a small set of events that capture the semantic behavior of the system. Typically, these events form chains or cascades. We often wish to change the system behavior using a single intervention that propagates through the cascade. For instance, one may trigger a biochemical cascade to switch the state of a cell or reroute a truck in logistic chains to meet an unexpected, urgent delivery. We introduce a new supervised learning setup called Cascade. An agent observes a system with known dynamics evolving from some initial state. It is given a structured semantic instruction and needs to make an intervention that triggers a cascade of events, such that the system reaches an alternative (counterfactual) behavior. We provide a test-bed for this problem, consisting of physical objects. We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to efficiently search in exponentially large semantic trees of continuous spaces. We demonstrate that our approach learns to effectively follow instructions to intervene in new complex scenes. When provided with an observed cascade of events, it can also reason about alternative outcomes.

1. INTRODUCTION

Teaching agents to understand and control their dynamic environments is a fundamental problem in AI. It becomes extremely challenging when events trigger other events. We denote such processes as cascading processes. As an example, consider a set of chemical reactions in a cellular pathway. The synthesis of a new molecule is a discrete event that later enables other chemical reactions. Cascading processes are also prevalent in man-made systems: In assembly lines, when one task is completed, e.g., construction of gears, it may trigger another task, e.g. building the transmission system. Cascading processes are abundant in many environments, from natural processes like chemical reactions, through managing crisis situations for natural disasters (Zuccaro et al., 2018; Nakano et al., 2022) to logistic chains or water treatment plants (Cong et al., 2010) . A major goal with cascading processes is to intervene and steer them towards a desired goal. For example, in biochemical cascades, one hopes to control chemical cascades in a cell by providing chemical signals; in logistics, a cargo dispatch plan may be completely modified by assigning a cargo plane to a different location. This paper addresses the problem of reasoning about a cascading process and controlling its qualitative behavior. We describe a new counterfactual reasoning setup called "Cascade", which is trained via supervised learning. At inference time, an agent observes a dynamical system, evolving through a cascading process that was triggered from some initial state. We refer to it as the "unsatisfied" or "observed" cascade. The goal of the agent is to steer the system toward a different, counterfactual, configuration. That target configuration is given as a set of qualitative constraints about the end results and the intermediate properties of the cascade. We call these constraints the "instruction". To satisfy that instruction, the agent intervenes with the system at a specific point in time by changing the state of one element which we call the "pivot". To solve the Cascade learning problem, we train an agent to select an intervention given a state of a system and an instruction. Importantly, we operate in a counterfactual mode (See Pearl, 2000) . During training, the agent only sees scenarios that are "satisfied", in the sense that the system dynamics All the balls are in motion, in all frames. Arrows " " highlight the event of each frame. obey the constraints given in the instruction. The reason is that in the real world it is not possible to rewind time and simultaneously obtain both a satisfied and an unsatisfied sequence of events. Steering a cascade process is hard. To see why, consider a natural but naive approach to the Cascade problem: train an end-to-end regression model that takes the system and instruction as input and predicts the necessary intervention. It is challenging because in many cases, a slight change in one part of the system can make a qualitative effect on the outcome. This may lead to an exponential number of potential cascades. This "butterfly effect" (Lorenz, 1993) is typical in cascading systems, like a billiard ball missing another ball by a thread or a truck reaching a warehouse right after another truck has already left. Back to the regression approach, we empirically find that it fails, presumably because the set of possible chains of events is exponentially large, and the model fails to learn how to find an appropriate chain that satisfies the instruction. We discuss other challenges in Section 4. Technical insights. In designing our approach, we follow two key ideas. First, instead of modeling the continuous dynamics of the system, we reduce the search space by focusing on a small number of discrete, semantic events. To do this, we design a representation called an "Event Tree" (Figure 2 ). In a billiard game, these events would be collisions of balls. In logistic chains, these events would be deliveries of items to their target location or assembly of parts. To reduce the search space, we build a tree of possible future events, where the root holds the initial world-state. Each child node corresponds to a possible future subsequent event from its parent. Thus, a path in the tree from a root to a descendant captures a realizable sequence of events. Our second idea is to learn how to efficiently search over the event tree. This is critical because the tree grows exponentially with its depth. We learn a function that assigns scores to tree nodes conditioned on the instruction and use these scores to prioritize the search. We also derived a Bayesian correction term to guide the search with the observed cascade: we first find the path in the event tree that corresponds to the observed cascade, and then correct the scores of nodes along that path. Modelling system dynamics with forward models. A forward model describes the evolution of the dynamic systems in small time steps. There is extensive literature on learning forward models from observations in physical systems (Fragkiadaki et al., 2016; Battaglia et al., 2016; Lerer et al., 2016; Watters et al., 2017; Janner et al., 2019) . Recent work also studied learning forward models for cascades (Qi et al., 2021; Girdhar et al., 2021) . However, once the forward model has been learned, the desired initial condition of the system is found by an exhaustive search. Here, we show that exhaustive search fails for complex cascades and with semantic constraints (Section 5). Therefore, our paper focuses on learning to search not on learning the forward model. We assume that we are given a special kind of a "forward" model operating at the level of semantic events. Namely, given a state of the cascading system, our forward model allows to query for the next event ("which objects collide next?"), and predict the outcome of that event (velocities of objects after collision). Test bed. We designed a well-controlled environment that shares key ingredients with real-world cascading processes. In our test-bed several spheres move freely on a table, colliding with each other



Figure 1: An experimental test bed for the Cascade setup. Input 1 (the unsatisfied cascade): A set of balls is observed moving in a confined space, colliding with each other, with walls, and with static pins (grey & black). Collisions yield a cascade of events (arrows). Input 2: A complex instruction describes a desired "counterfactual" cascade of events and its constraints. Output (the satisfied cascade): The agent intervenes and sets the (continuous, 2D) initial velocity of the purple ball (the "pivot") to achieve the goal, satisfying the constraints. Only keyframes are shown. See full videos here: https://youtu.be/ u1Io-ZWC1Sw (Anonymous) Input 1: An observed cascade. t=1.42: purple ball hits the right-wall t=2.04: purple hits gray pin t=3.04: yellow hits cyan t=4.44: cyan hits gray pin

