SELF-SUPERVISED DEBIASING USING LOW RANK REG-ULARIZATION

Abstract

Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability. While most of existing debiasing methods require full supervisions on either spurious attributes or target labels, training a debiased model from a limited amount of both annotations is still an open issue. To overcome such limitations, we first examine an interesting phenomenon by the spectral analysis of latent representations: spuriously correlated, easy-to-learn attributes make neural networks inductively biased towards encoding lower effective rank representations. We also show that a rank regularization can amplify this bias in a way that encourages highly correlated features. Motivated by these observations, we propose a self-supervised debiasing framework that is potentially compatible with unlabeled samples. Specifically, we first pretrain a biased encoder in a self-supervised manner with the rank regularization, serving as a semantic bottleneck to enforce the encoder to learn the spuriously correlated attributes. This biased encoder is then used to discover and upweight bias-conflicting samples in a downstream task, serving as a boosting to effectively debias the main model. Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines and, in some cases, even outperforms state-of-the-art supervised debiasing approaches.

1. INTRODUCTION

While modern deep learning solves several challenging tasks successfully, a series of recent works (Geirhos et al., 2018; Gururangan et al., 2018; Feldman et al., 2015) have reported that the high accuracy of deep networks on in-distribution samples does not always guarantee low test error on out-of-distribution (OOD) samples, especially in the context of spurious correlations. Arjovsky et al. (2019) ; Nagarajan et al. (2020) ; Tsipras et al. (2018) suggest that the deep networks can be potentially biased to the spuriously correlated attributes, or dataset bias, which are misleading statistical heuristics that are closely correlated but not causally related to the target label. In this regard, several recent works explain this phenomenon through the lens of simplicity bias (Rahaman et al., 2019; Neyshabur et al., 2014; Shah et al., 2020) of gradient descent-based deep networks optimization; deep networks prefer to rely on spurious features which are more "simpler" to learn, e.g., more linear. The catastrophic pitfalls of dataset bias have facilitated the development of debiasing methods, which can be roughly categorized into approaches (1) leveraging annotations of spurious attributes, i.e., bias label (Kim et al., 2019; Sagawa et al., 2019; Wang et al., 2020; Tartaglione et al., 2021) , (2) presuming specific type of bias, e.g., color and texture (Bahng et al., 2020; Wang et al., 2019; Ge et al., 2021) or (3) without using explicit kinds of supervisions on dataset bias (Liu et al., 2021; Nam et al., 2020; Lee et al., 2021; Levy et al., 2020; Zhang et al., 2022) . While substantial technical advances have been made in this regard, these approaches still fail to address the open problem: how to train a debiased classifier by fully exploiting unlabeled samples lacking both bias and target label. More specifically, while the large-scale unlabeled dataset can be potentially biased towards spuriously correlated sensitive attributes, e.g., ethnicity, gender, or age (Abid et al., 2021; Agarwal et al., 2021) , most existing debiasing frameworks are not designed to deal with this unsupervised settings. Moreover, recent works on self-supervised learning have reported that self-supervised learning may still suffer from poor OOD generalization (Geirhos et al., To address this question, we first made a series of observations about the dynamics of representations complexity by controlling the degree of spurious correlations in synthetic simulations. Interestingly, we found that spurious correlations suppress the effective rank (Roy & Vetterli, 2007) of latent representations, which severely deteriorates the semantic diversity of representations and leads to the degradation of feature discriminability. Another notable aspect of our findings is that the intentional increase of feature redundancy leads to amplifying "prejudice" in neural networks. To be specific, as we enforce the correlation among latent features to regularize the effective rank of representations (i.e., rank regularization), the accuracy on bias-conflicting samples quickly declines while the model still performs reasonably well on the bias-alignedfoot_0 samples. Inspired by these observations, we propose a self-supervised debiasing framework that can fully utilize potentially biased unlabeled samples. We pretrain (1) a biased encoder with rank regularization, which serves as a semantic bottleneck limiting the semantic diversity of feature components, and (2) the main encoder with standard self-supervised learning approaches. Specifically, the biased encoder gives us the leverage to uncover spurious correlations and identify bias-conflicting training samples in a downstream task. Contributions. In summary, the contributions of this paper are as follows: First, we empirically demonstrate the inductive bias of neural networks in favor of low rank representations in the presence of spurious correlations. Based on these observations, we propose a novel rank-regularization debiasing framework that fully exploits unlabeled samples that do not contain annotation for bias and target label. Various experiments on real-world biased datasets demonstrate that retraining linear classifier in the last layer with upweighting of identified bias-conflicting samples significantly improves the OOD generalization in the linear evaluation protocol (Oord et al., 2018) , even without making any modifications on the pretrained encoder. Our approach improves the accuracy on biasconflicting evaluation set by 36.4% → 59.5%, 48.6% → 58.4% on UTKFace (Zhang et al., 2017) and CelebA (Liu et al., 2015) with age and gender bias, respectively, compared to the best selfsupervised baseline. Moreover, we found that the proposed framework outperforms state-of-the-art supervised debiasing methods in semi-supervised learning problem with CelebA.

2.1. PRELIMINARIES

Preliminaries. To evaluate the semantic diversity of given representation matrix, we introduce effective rank (Roy & Vetterli, 2007) which is a widely used metric to measure the effective dimensionality of matrix and analyze the spectral properties of features in neural networks (Arora et al., 2019; Razin & Cohen, 2020; Huh et al., 2021; Baratin et al., 2021) : Definition 2.1 Given the matrix X ∈ R m×n and its singular values {σ i } min (m,n) i=1 , the effective rank ρ of X is defined as the shannon entropy of normalized singular values: ρ(X) = - min (m,n) i=1 σi log σi , where σi = σ i / k σ k is i-th normalized singular value. Without loss of generality, we omit the exponentiation of ρ(X) as done in Roy & Vetterli (2007) . Effective rank is also referred to as spectral entropy where its value is maximized when the singular values are all equal, and minimized when a top singular value dominates relative to all others. Recent works (Chen et al., 2019b; a) have revealed that the discriminability of representations resides on wide range of eigenvectors since the rich discriminative information for the classification



The bias-aligned samples refer to data with a strong correlation between (potentially latent) spurious features and target labels. The bias-conflicting samples refer to the opposite cases where spurious correlations do not exist.

