CAUSAL REPRESENTATION LEARNING FOR INSTANTANEOUS AND TEMPORAL EFFECTS IN INTERACTIVE SYSTEMS

Abstract

Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that allows for instantaneous effects in intervened temporal sequences when intervention targets can be observed, e.g., as actions of an agent. iCITRIS identifies the potentially multidimensional causal variables from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three datasets of interactive systems, iCITRIS accurately identifies the causal variables and their causal graph.

1. INTRODUCTION

Recently, there has been a growing interest in causal representation learning (Schölkopf et al., 2021) , which aims at learning representations of causal variables in an underlying system from highdimensional observations like images. Several works have considered identifying causal variables from time series data, assuming that the variables are independent of each other conditioned on the previous time step (Gresele et al., 2021; Khemakhem et al., 2020a; Lachapelle et al., 2022a; b; Lippe et al., 2022b; Yao et al., 2022a; b) . This assumes that within each discrete, measured time step, intervening on one causal variable does not affect any other variable instantaneously. However, in real-world systems, this assumption is often violated, as there might be causal effects that act faster than the measurement or frame rate (Faes et al., 2010; Hyvärinen et al., 2008; Moneta et al., 2006; Nuzzi et al., 2021) . Consider the example of a light switch and a light bulb. When flipping the switch, there is an almost immediate effect on the light by turning it on or off, changing the appearance of the whole room instantaneously. In this case, an intervention on a variable (e.g., the switch) also affects other variables (e.g., the bulb) in the same time step, violating the assumption that each variable is independent of the others in the same time step, conditioned on the previous time step. In biology, some protein-protein interactions also occur nearly-instantaneously (Acuner Ozbabacan et al., 2011) . To overcome this limitation, we consider the task of identifying causal variables and their causal graphs from temporal sequences, even in case of instantaneous cause-effect relations. This task contains two main challenges: identifying the causal variables from observations, and learning the causal relations between those variables. We show that, as opposed to temporal sequences without instantaneous

