INTERPRETABLE DEBIASING OF VECTORIZED LAN-GUAGE REPRESENTATIONS WITH ITERATIVE ORTHOG-ONALIZATION

Abstract

We propose a new mechanism to augment a word vector embedding representation that offers improved bias removal while retaining the key information-resulting in improved interpretability of the representation. Rather than removing the information associated with a concept that may induce bias, our proposed method identifies two concept subspaces and makes them orthogonal. The resulting representation has these two concepts uncorrelated. Moreover, because they are orthogonal, one can simply apply a rotation on the basis of the representation so that the resulting subspace corresponds with coordinates. This explicit encoding of concepts to coordinates works because they have been made fully orthogonal, which previous approaches do not achieve. Furthermore, we show that this can be extended to multiple subspaces. As a result, one can choose a subset of concepts to be represented transparently and explicitly, while the others are retained in the mixed but extremely expressive format of the representation.

1. INTRODUCTION

Vectorized representation of structured data, especially text in Word2Vec (Mikolov et al., 2013) , GloVe (Pennington et al., 2014) , FastText (Joulin et al., 2016) , etc., have become an enormously powerful and useful method for facilitating language learning and understanding. And while for natural language data contextualized embeddings, e.g., ELMO (Peters et al., 2018) , BERT (Devlin et al., 2019 ), RoBERTa (Liu et al., 2019) , etc have become the standard for many analysis pipelines, the non-contextualized versions have retained an important purpose for low-resource languages, for their synonym tasks, and their interpretability. In particular, these versions have the intuitive representation that each word is mapped to a vector in a high-dimensional space, and the (cosine) similarity between words in this representation captures how similar the words are in meaning, by how similar are the contexts in which they are commonly used. Such vectorized representations are common among many other types of structured data, including images (Kiela & Bottou, 2014; Lowe, 2004) , nodes in a social network (Grover & Leskovec, 2016; Perozzi et al., 2014) , spatial regions of interest (Jenkins et al., 2019) , merchants in a financial network (Wang et al., 2021) , and many more. In all of these cases, the most effective representations are large, high-dimensional, and trained on a large amount of data. This can be an expensive endeavor, and the goal is often to complete this embedding task once and then use these representations as an intermediate step in many downstream tasks. In this paper, we consider the goal of adding or adjusting structure in existing embeddings as part of a light-weight representation augmentation. The goal is to complete this without expensive retraining of the embedding but to improve the representation's usefulness, meaningfulness, and interpretability. Within language models, this has most commonly been considered within the context of bias removal (Bolukbasi et al., 2016; Dev & Phillips, 2019) . Here commonly, one identifies a linear subspace that encodes some concept (e.g., male-to-female gender) and may modify or remove that subspace when the concept it encodes is not appropriate for a downstream task (e.g., resume ranking). One recently proposed approach of interest called Orthogonal Subspace Correction and

