CAUSAL REPRESENTATION LEARNING FOR INSTANTANEOUS AND TEMPORAL EFFECTS IN INTERACTIVE SYSTEMS

Abstract

Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that allows for instantaneous effects in intervened temporal sequences when intervention targets can be observed, e.g., as actions of an agent. iCITRIS identifies the potentially multidimensional causal variables from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three datasets of interactive systems, iCITRIS accurately identifies the causal variables and their causal graph.

1. INTRODUCTION

Recently, there has been a growing interest in causal representation learning (Schölkopf et al., 2021) , which aims at learning representations of causal variables in an underlying system from highdimensional observations like images. Several works have considered identifying causal variables from time series data, assuming that the variables are independent of each other conditioned on the previous time step (Gresele et al., 2021; Khemakhem et al., 2020a; Lachapelle et al., 2022a; b; Lippe et al., 2022b; Yao et al., 2022a; b) . This assumes that within each discrete, measured time step, intervening on one causal variable does not affect any other variable instantaneously. However, in real-world systems, this assumption is often violated, as there might be causal effects that act faster than the measurement or frame rate (Faes et al., 2010; Hyvärinen et al., 2008; Moneta et al., 2006; Nuzzi et al., 2021) . Consider the example of a light switch and a light bulb. When flipping the switch, there is an almost immediate effect on the light by turning it on or off, changing the appearance of the whole room instantaneously. In this case, an intervention on a variable (e.g., the switch) also affects other variables (e.g., the bulb) in the same time step, violating the assumption that each variable is independent of the others in the same time step, conditioned on the previous time step. In biology, some protein-protein interactions also occur nearly-instantaneously (Acuner Ozbabacan et al., 2011) . To overcome this limitation, we consider the task of identifying causal variables and their causal graphs from temporal sequences, even in case of instantaneous cause-effect relations. This task contains two main challenges: identifying the causal variables from observations, and learning the causal relations between those variables. We show that, as opposed to temporal sequences without instantaneous effects, neither of these two tasks can be completed without the other: without knowing the variables, we cannot identify the graph; but without knowing the graph, we cannot identify the causal variables, since they are not conditionally independent. In particular, in contrast to causal relations across time steps, the orientations of instantaneous edges are not determined by the temporal ordering, hence requiring to jointly solve the tasks of causal representation learning and causal discovery. As a starting point, we consider the setting of CITRIS (Causal Identifiability from Temporal Intervened Sequences; Lippe et al. ( 2022b)). In CITRIS, potentially multidimensional causal variables interact over time, and interventions with known targets may have been performed. While in that work all causal relations were assumed to be temporal, i.e., from variables in one time step to variables in the next time step, we generalize this setting to include instantaneous causal effects. In particular, we show that in general, causal variables are not identifiable if we do not have access to partiallyperfect interventions, i.e., interventions that remove the instantaneous parents. If such interventions are available, we prove that we can identify the minimal causal variables (Lippe et al., 2022b), i.e., the parts of the causal variables that are affected by the interventions, and their temporal and instantaneous causal graph. Our results generalize the identifiability results of Lippe et al. (2022b), since if there are no instantaneous causal relations, any intervention is partially-perfect by definition. As a practical implementation, we propose instantaneous CITRIS (iCITRIS). iCITRIS maps high-dimensional observations, e.g., images, to a lower-dimensional latent space on which it learns an instantaneous causal graph by integrating differentiable causal discovery methods into its prior (Lippe et al., 2022a; Zheng et al., 2018) . In experiments on three different video datasets, iCITRIS accurately identifies the causal variables as well as their instantaneous and temporal causal graph. Our contributions are: • We show that causal variables in temporal sequences with instantaneous effects are not identifiable without interventions that remove instantaneous parents. • We prove that when having access to such interventions with known targets, the minimal causal variables can be identified along with their causal graph under mild assumptions. • We propose iCITRIS, a causal representation learning method that identifies minimal causal variables and their causal graph even in the case of instantaneous causal effects.

Related Work

We provide an extended discussion on related work in Appendix C. Early works on causal representation learning focused on identifying independent factors of variations (Klindt et al., 2021; Kumar et al., 2018; Locatello et al., 2019; 2020b; Träuble et al., 2021) , in settings similar to Independent Component Analysis (ICA) (Comon, 1994; Hyvärinen et al., 2001; 2019) (Lippe et al., 2022b) focuses on temporal sequences, in which also the variables that are not intervened upon can still continue evolving over time. On the other hand, in this setting the intervention targets need to be known. Moreover, within a time step, the causal variables are assumed to be independent conditioned on the variables of the previous time step, hence not allowing for instantaneous effects. To the best of our knowledge, iCITRIS is the first method to identify causal variables and their causal graph from temporal, intervened sequences even for potentially instantaneous causal effects, without requiring counterfactuals or data labeled with the true causal variables.

2. RELEVANT BACKGROUND AND DEFINITIONS

In this work, we start from the setting of Temporal Intervened Sequences (TRIS) (Lippe et al., 2022b) . For clarity, we provide a brief overview of TRIS and discuss previous identifiability results, before extending and generalizing the theory to instantaneous effects.

2.1. TEMPORAL INTERVENED SEQUENCES

Temporal intervened sequences (TRIS) (Lippe et al., 2022b) are a latent temporal causal process S with K causal variables (C t 1 , C t 2 , ..., C t K ) T t=1 (e.g., the light switch and bulb), representing a dynamic



. In particular, Lachapelle et al. (2022a;b); Yao et al. (2022a;b) discuss the identifiability of causal variables from temporal sequences. Yet, in all of these ICA-based setups, causal variables are required to be conditionally independent. For causally-dependent variables, Yang et al. (2021) learn causal variables from labeled images in a supervised manner. Ahuja et al. (2022); Brehmer et al. (2022) identify causal variables with unknown causal relations from pairs of observations that only differ in a subset of causal factors influenced by an intervention, i.e., having counterfactual observations. As discussed by Pearl (2009), however, knowing counterfactuals is not realistic in most scenarios. Instead, CITRIS

