GENERALIZED BELIEF TRANSPORT

Abstract

Human learners have ability to adopt appropriate learning approaches depending on constraints such as prior on the hypothesis and urgency of decision. However, existing learning models are typically considered individually rather than in relation to one and other. To build agents that have the ability to move between different modes of learning over time, it is important to understand how learning models are related as points in a broader space of possibilities. We introduce a mathematical framework, Generalized Belief Transport (GBT), that unifies and generalizes prior models, including Bayesian inference, cooperative communication and classification, as parameterizations of three learning constraints within Unbalanced Optimal Transport (UOT). We visualize the space of learning models encoded by GBT as a cube which includes classic learning models as special points. We derive critical properties of this parameterized space including proving continuity and differentiability which is the basis for model interpolation, and study limiting behavior of the parameters, which allows attaching learning models on the boundaries. Moreover, we investigate the long-run behavior of GBT, explore convergence properties of models in GBT mathematical and computationally, and formulate conjectures about general behavior. We conclude with open questions and implications for more unified models of learning.

1. INTRODUCTION

Learning and inference are subject to internal and external constraints. Internal constraints include the availability of relevant prior knowledge, which may be brought to bear on inferences based on data. External constraints include the availability of time to accumulate evidence or make the best decision now. Human learners appear to be capable of moving between these constraints as necessary. However, standard models of machine learning tend to view constraints as different problems, which impedes development of unified view of learning agents. Indeed, internal and external constraints on learning map onto classic dichotomies in machine learning. Internal constraints such as the availability of prior knowledge maps onto the Frequentist-Bayesian dichotomy in which the latter uses prior knowledge as a constraint on posterior beliefs, while the former does not. Within Bayesian theory, a classic debate pertains to uninformative, or minimally informative, settings of priors (Jeffreys, 1946; Robert et al., 2009) . External constraints such as availability of time to accumulate evidence versus the need to make the best possible decision now informs the use of generative versus discriminative approaches (Ng and Jordan, 2001) . Despite the fundamental nature of these debates, and the usefulness of all approaches in the appropriate contexts, we are unaware of prior efforts to unify these perspectives and study the full space of possible models. We introduce Generalized Belief Transport (GBT), based on Unbalanced Optimal Transport (Sec. 2), which paramterizes and interpolates between known reasoning modes (Sec. 3.2), with four major contributions. First, we prove continuity in the parameterization and differentiability on the interior of the parameter space (Sec. 3.1). Second, we analyze the behavior under variations in the parameter space (Sec. 3.3) . Third, we consider sequential learning, where learners may (not) track the empirically observed data frequencies. And finally we state our theoretical results, simulations and conjectures about the sequential behaviors for various parameters for generic costs and priors (Sec. 4.2). Notations. R ≥0 denotes the non-negative reals. Vector 1 = (1, . . . , 1). The i-th component of vector v is v(i). P(A) is the set of probability distributions over A. For a matrix M , M ij represents its (i, j)-th entry, M (i,_) denotes its i-th row, and M (_,j) denotes its j-th column. Probability is P( • ).

