TOPOLOGY-AWARE ROBUST OPTIMIZATION FOR OUT-OF-DISTRIBUTION GENERALIZATION

Abstract

Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD resilience. To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. More specifically, TRO solves two optimization objectives: (1) Topology Learning which explores data manifold to uncover the distributional topology; (2) Learning on Topology which exploits the topology to constrain robust optimization for tightlybounded generalization risks. We theoretically demonstrate the effectiveness of our approach, and empirically show that it significantly outperforms the state of the arts in a wide range of tasks including classification, regression, and semantic segmentation. Moreover, we empirically find the data-driven distributional topology is consistent with domain knowledge, enhancing the explainability of our approach.

1. INTRODUCTION

Recent years have witnessed a surge of applying machine learning (ML) in high-stake and safetycritical applications. Such applications pose an unprecedented out-of-distribution (OOD) generalization challenge: ML models are constantly exposed to unseen distributions that lie outside their training space. Despite well-documented success for interpolation, modern ML models (e.g., deep neural networks) are notoriously weak for extrapolation; a highly accurate model on average can fail catastrophically when presented with rare or unseen distributions (Arjovsky et al., 2019) . For example, a flood predictor, trained with data of all 89 major flood events in the U.S. from 2000 to 2020, would erroneously predict on event "Hurricane Ida" in 2021. Without addressing this challenge, it is unclear when and where a model can be applied and how much risk is associated with its use. A promising solution for out-of-distribution generalization is to conduct distributionally robust optimization (DRO) (Namkoong & Duchi, 2016; Staib & Jegelka, 2019; Levy et al., 2020) . DRO minimizes the worst-case expected risk over an uncertainty set of potential test distributions. The uncertainty set is typically formulated as a divergence ball surrounding the training distribution endowed with a certain distance metric such as f -divergence (Namkoong & Duchi, 2016) and Wasserstein distance (Shafieezadeh Abadeh et al., 2018) . Compared to empirical risk minimization (ERM) (Vapnik, 1998) that minimizes the average loss, DRO is more robust against distributional drifts from spurious correlations, adversarial attacks, subpopulations, or naturally-occurring variation (Robey et al., 2021) . However, it is non-trivial to build a realistic uncertainty set that truly approximates unseen distributions. On the one hand, to confer robustness against extensive distributional drifts, the uncertainty set has to be sufficiently large, which increases the risks of conferring implausible distributions, e.g., outliers, and thus yielding overly pessimistic models with low prediction confidence (Hu et al., 2018; Frogner et al., 2021) . On the other hand, the worst-case distributions are not necessarily the influential ones that are truly connected to unseen distributions; optimizing over worst-case rather than influential distributions would yield compromised OOD resilience.



The source code and pre-trained models are available at: https://github.com/joffery/TRO. 1

