

Abstract

Neural networks are known to be biased towards learning mechanisms that help identify spurious attributes, yielding features that do not generalize well under distribution shifts. To understand and address this limitation, we study the geometry of neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks are connected via simple paths of low loss. Our work addresses two questions: (i) do minimizers that encode dissimilar mechanisms connect via simple paths of low loss? (ii) can fine-tuning a pretrained model help switch between such minimizers? We define a notion of mechanistic similarity and demonstrate that lack of linear connectivity between two minimizers implies the corresponding models use dissimilar mechanisms for making their predictions. This property helps us demonstrate that naïve fine-tuning can fail to eliminate a model's reliance on spurious attributes. We thus propose a method for altering a model's mechanisms, named connectivity-based fine-tuning, and validate its usefulness by inducing models invariant to spurious attributes.

1. INTRODUCTION

Deep neural networks (DNNs) suffer from various robustness problems, learning representations that fail to generalize well beyond the given training distribution (D'Amour et al., 2020; Teney et al., 2022; Geirhos et al., 2020; Recht et al., 2019; Taori et al., 2020; Jacobsen et al., 2018) . This lack of robustness is generally a consequence of models learning mechanisms that rely on spurious attributes in the training data for making their predictions. Such attributes-even if not perfectly predictive-tend to be simpler to represent according to the model's inductive biases (Nakkiran et al., 2019; Valle-Perez et al., 2018; Hu et al., 2020; Shah et al., 2020) and commonly emerge due to sampling biases and hidden confounders in static datasets (Kaur et al., 2022; Lee et al., 2022) . For example, in most vision datasets, backgrounds are correlated with object categories-a sampling bias (Beery et al., 2018; Xiao et al., 2020) . Consequently, a model can learn to predict the correct category of an object by learning mechanisms to identify either its background or its shape; however, only models that rely on shape are likely to generalize robustly (Geirhos et al., 2018; Dittadi et al., 2020 ). Indeed, Scimeca et al. (2021); Hermann & Lampinen (2020) show that using different datasets for a task, standard training pipelines can induce models that use entirely distinct mechanisms for making their predictions, performing equally well in-distribution, but vastly differently out-of-distribution. Recent works on improving neural networks robustness thus advocate a need for modeling the causal mechanisms underlying the data-generating process (Arjovsky et al., 2019; Krueger et al., 2021; Lu et al., 2021) , promoting representations invariant to spurious attributes. This Work: In this paper, we introduce the idea of mechanistic similarity (Sec. 3) to assess whether two models rely on the same input attributes for making their predictions. Specifically, we call two models mechanistically similar if they exhibit invariance to the same attributes of an input, but may otherwise produce different representations for it. Our motivating question is whether a model can be fine-tuned to alter its mechanisms, i.e., to learn different invariances; we call this the problem of mechanistic fine-tuning (Sec. 5). For instance, if a model has learned to rely on a spurious attribute in its training data, can that reliance be eliminated by training it on a minimal set of "clean" samples that do not contain the spurious attribute? This problem is of practical value because curating a large, clean dataset and training from scratch on it in the first place can be expensive. We consider the problem above through the lens of mode connectivity in neural networks, which refers to the phenomenon that neural network minimizers identified via training on the same dataset for a task tend to be connected via relatively simple paths of low loss in the model's loss landscape (e.g., linear or quadratic splines) (Garipov et al., 2018; Draxler et al., 2018; Frankle et al., 2020; Entezari et al., 2021; Kuditipudi et al., 2019; Nguyen et al., 2021) . In particular, we analyze connectivity of mechanistically dissimilar models that are induced via training on different datasets for a task and can hence learn to rely on entirely distinct attributes of an input to make their predictions (see Fig. 1 ). Figure 1 : Mechanistic Lens on Mode connectivity. Consider two sets of parameters that minimize loss using background θ background and object shape θ shape as the input attributes for prediction, respectively. Are such mechanistically dissimilar minimizers connected via paths of low loss in the landscape? Does the dissimilarity of these mechanisms affect the simplicity of their connectivity paths? Can we exploit this connectivity to switch between minimizers that use our desired mechanisms? We induce such mechanistically dissimilar models by embedding synthetic cues in existing datasets (see Fig. 2 ) and training models on these datasets under different proportions of samples with spurious attributes (see Fig. 4 ); given easily separable cues, such manipulated data allows training of models that learn mechanisms to preferentially identify the synthetic cue over natural data attributes (Shah et al., 2020) . Our extensive analysis (Sec. 4) shows that if two models lack linear connectivity in the landscape, they must be mechanistically dissimilar; that is, increase in loss as we linearly move between two models implies they have learned different invariances. This result holds implications for naïve fine-tuning of pretrained networks, which often yields models linearly connected with the original pretraining minimizer (Neyshabur et al., 2020) and can hence be insufficient for altering a model's prediction mechanisms (Fig. 5 ). We thus propose a technique, named Connectivity-Based Fine-Tuning (CBFT), that exploits lack of linear connectivity between mechanistically dissimilar models to induce networks that follow our desired mechanisms (Sec. 5). Our extensive experimental results show CBFT is more effective at reducing a model's sensitivity to spurious attributes than recent techniques (Kirichenko et al., 2022b; Kumar et al., 2022) .

2. PRELIMINARIES

Notations. Consider a neural network f : R n × R d → [K] that takes n-dimensional inputs x ∈ X ⊂ R n , has parameters θ ∈ R d , and produces an output f (x; θ) ∈ [K]. We say θ "induces the model" f (.; θ). Loss on a dataset D ∈ X × [K] for parameters θ is denoted using a non-negative function L(f (D; θ)) and θ is called a minimizer for that dataset if L(f (D; θ)) < ϵ, where ϵ is some small scalar. In the following, our focus will be minimizers retrieved by running SGD on a dataset's loss. Assume there is a latent space Z ⊂ R m with z sampled from some factorial distribution, P (z) = i P (z i ). These latent variables are assumed to generate samples in the dataset via G : Z → X × [K]; (x, y) := G(z), with x and y conditionally independent given z. If G X , G Y define the components of G producing x and y, we assume G X (.) has a valid left-inverse G -1 X : X → Z, i.e., x contains all of the information necessary to recover the true settings of the (independent) latent variables, z. These assumptions are standard in literature on disentanglement (Locatello et al., 2019; 2020; Gresele et al., 2020; 2021; Von Kügelgen et al., 2021) and Independent Component Analysis (ICA) (Hyvarinen & Morioka, 2016; 2017; Khemakhem et al., 2020; 2021) .

2.1. MODE CONNECTIVITY

We denote a continuous path between two sets of parameters θ 1 , θ 2 as γ θ1→θ2 (t), where γ θ1→θ2 (0) = θ 1 and γ θ1→θ2 (1) = θ 2 . At times, we say a path lacks loss barriers on a dataset, if moving along the path never yields increase in loss on that dataset. Using the above notations, we now formalize the notion of mode connectivity, in line with previous work (Garipov et al., 2018; Draxler et al., 2018) . Definition 1. (Mode Connectivity Along a Path.) Minimizers θ 1 , θ 2 of loss L(f (D; θ)) on a dataset D are called mode connected along the path γ θ1→θ2 (t) if moving along the path never increases loss. Formally, ∀ t ∈ [0, 1], L(f (D, γ θ1→θ2 (t))) ≤ min{L(f (D; θ 1 )), L(f (D; θ 1 ))}. As mentioned in Sec. 1, prior work has shown that minimizers of modern neural networks exhibit mode connectivity along rather simple paths in the landscape. This property was first illustrated in

