PARTIAL TRANSPORTABILITY FOR DOMAIN GENERALIZATION

Abstract

Learning prediction models that generalize to related domains is one of the most fundamental challenges in artificial intelligence. There exists a growing literature that argues for learning invariant associations using data from multiple source domains. However, whether invariant predictors generalize to a given target domain depends crucially on the assumed structural changes between domains. Using the perspective of transportability theory, we show that invariance learning, and the settings in which invariant predictors are optimal in terms of worst-case losses, is a special case of a more general partial transportability task. Specifically, the partial transportability task seeks to identify / bound a conditional expectation E P ˚rY | xs in an unseen domain π ˚using knowledge of qualitative changes across domains in the form of causal graphs and data from source domains π 1 , . . . , π k . We show that solutions to this problem have a much wider generalization guarantee that subsumes those of invariance learning and other robust optimization methods that are inspired by causality. For computations in practice, we develop an algorithm that provably provides tight bounds asymptotically in the number of data samples from source domains for any partial transportability problem with discrete observables and illustrate its use on synthetic datasets.

1. INTRODUCTION

Generalization guarantees are central to the design of reliable machine learning models as the predictions and conclusions obtained in one or several source domains π 1 , . . . , π k (e.g. in controlled laboratory circumstances, from a specific study or population, etc.) are transported and applied elsewhere, in a domain π ˚that may differ in several aspects from that of source domains. It is apparent that what structure and what assumptions are imposed on the relationship between domains determines whether a model will generalize as intended. For example, if the target environment is arbitrary, or substantially different from the study environment, transporting predictions is difficult or even impossible. A structural account of causation provides suitable semantics for reasoning about the structural invariances across different domains, and has been studied under the umbrella of transportability theory (Pearl & Bareinboim, 2011; Bareinboim et al., 2013; Bareinboim & Pearl, 2016) . Each domain π i is associated with a different structural causal model (SCM) M i that differs in one or more of its component parts with respect to other domains and defines different distributions over the observed variables. In practice, the SCMs are usually not fully observable, which leads to the transportability challenge of using data from one (or more) SCMs to make inference about distributions from another SCM. A query, e.g. E P ˚rY | xs, is said to be point identified if it can be uniquely computed given available data (from one or more domains) and qualitative knowledge about the causal changes between domains in the form of selection diagrams. However, in problems of transportability, especially when no data in the target domain can be collected, the combination of qualitative assumptions and data often does not permit one to uniquely determine a given query, which is said to be non-identifiable. In such cases, partial identification methods deal with bounding a given query e.g. l ă E P ˚rY | xs ă u in non-identifiable problems and may still serve an informative purpose for decision-making if 0 ă l ă u ă 1. Both settings have been studied in the literature. In particular, there exists an extensive set of graphical conditions and algorithms for the identifiability of observational, interventional, and counterfactuals distributions across domains from a combination of

