INFOOT: INFORMATION MAXIMIZING OPTIMAL TRANSPORT

Abstract

Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.

1. INTRODUCTION

Optimal Transport (OT) provides a general framework with a strong theoretical foundation to compare probability distributions based on the geometry of their underlying spaces (Villani, 2009) . Besides its fundamental role in mathematics, OT has increasingly received attention in machine learning due to its wide range of applications in domain adaptation (Courty et al., 2017; Redko et al., 2019; Xu et al., 2020) , generative modeling (Arjovsky et al., 2017; Bousquet et al., 2017) , representation learning (Ozair et al., 2019; Chuang et al., 2022) , and generalization bounds (Chuang et al., 2021) . The development of efficient algorithms (Cuturi, 2013; Peyré et al., 2016) has significantly accelerated the adoption of optimal transport in these applications. Computationally, the discrete formulation of OT seeks a matrix, also called transportation plan, that minimizes the total geometric transportation cost between two sets of samples drawn from the source and target distributions. The transportation plan implicitly defines (soft) correspondences across these samples, but provides no mechanism to relate newly-drawn data points. Aligning these requires solving a new OT problem from scratch. This limits the applicability of OT, e.g., to streaming settings where the samples arrive in sequence, or very large datasets where we can only solve OT on a subset. In this case, the current solution cannot be used on future data. To overcome this fundamental constraint, a line of work proposes to directly estimate a mapping, the pushforward from source to target, that minimizes the transportation cost (Perrot et al., 2016; Seguy et al., 2017) . Nevertheless, the resulting mapping is highly dependent on the complexity of the mapping function (Galanti et al., 2021) . OT could also yield alignments that ignore the intrinsic coherence structure of the data. In particular, by relying exclusively on pairwise geometric distances, two nearby source samples could be mapped to disparate target samples, as in Figure 1 , which is undesirable in some settings. For instance, when applying OT for domain adaptation, source samples with the same class should ideally be mapped to similar target samples. To mitigate this, prior work has sought to impose structural priors on the OT objective, e.g., via submodular cost functions (Alvarez-Melis et al., 2018) or a Gromov-Wasserstein regularizer (Vayer et al., 2018b; a) . However, these methods still suffer from sensitivity to outliers (Mukherjee et al., 2021) and imbalanced data (Hsu et al., 2015; Tan et al., 2020) . This work presents Information Maximization Optimal Transport (InfoOT), an information-theoretic extension of the optimal transport problem that generalizes the usual formulation by infusing it with global structure in form of mutual information. In particular, InfoOT seeks alignments that maximize mutual information, an information-theoretic measure of dependence, between domains. To

