LEARNING TO EXTRAPOLATE: A TRANSDUCTIVE AP-PROACH

Abstract

Machine learning systems, especially with overparameterized deep neural networks, can generalize to novel test instances drawn from the same distribution as the training data. However, they fare poorly when evaluated on out-of-support test points. In this work, we tackle the problem of developing machine learning systems that retain the power of overparameterized function approximators while enabling extrapolation to out-of-support test points when possible. This is accomplished by noting that under certain conditions, a "transductive" reparameterization can convert an out-of-support extrapolation problem into a problem of within-support combinatorial generalization. We propose a simple strategy based on bilinear embeddings to enable this type of combinatorial generalization, thereby addressing the out-of-support extrapolation problem under certain conditions. We instantiate a simple, practical algorithm applicable to various supervised learning and imitation learning tasks.

1. INTRODUCTION

Generalization is a central problem in machine learning. Typically, one expects generalization when the test data is sampled from the same distribution as the training set, i.e out-of-sample generalization. However, in many scenarios, test data is sampled from a different distribution from the training set, i.e., out-of-distribution (OOD). In some OOD scenarios, the test distribution is assumed to be known during training -a common assumption made by meta-learning methods (Finn et al., 2017b) . Several works have tackled a more general scenario of "reweighted" distribution shift (Koh et al., 2021; Quinonero-Candela et al., 2008) where the test distribution shares support with the training distribution, but has a different and unknown probability density. This setting can be tackled via distributional robustness approaches (Sinha et al., 2018; Rahimian & Mehrotra, 2019) . Our paper aims to find structural conditions under which generalization to test data with support outside of the training distribution is possible. Formally, assume the problem of learning function h: ŷ = h(x) using data {(x i , y i )} N i=1 ∼ D train , where x i ∈ X train , the training domain. We are interested in making accurate predictions h(x) for x / ∈ X train (see examples in Fig 1 ). Consider an example task of predicting actions to reach a desired goal (Fig 1b ). During training, goals are provided from the blue cuboid (x ∈ X train ), but test time goals are from the orange cuboid (x / ∈ X train ). If h is modeled using a deep neural network, its predictions on test goals in the blue area are likely to be accurate, but for the goals in the orange area the performance can be arbitrarily poor unless further domain knowledge is incorporated. This challenge manifests itself in a variety of real-world problems, ranging from supervised learning problems like object classification (Barbu et al., 2019) , sequential decision making with reinforcement learning (Kirk et al., 2021) , transferring reinforcement learning policies from simulation to real-world (Zhao et al., 2020) , imitation learning (de Haan et al., 2019) , etc. Reliably deploying learning algorithms in unconstrained environments requires accounting for such "out-of-support" distribution shift, which we refer to as extrapolation. It is widely accepted that if one can identify some structure in the training data that constrains the behavior of optimal predictors on novel data, then extrapolation may become possible. Several methods can extrapolate if the nature of distribution shift is known apriori: convolution neural

