FILTERED INNER PRODUCT PROJECTION FOR CROSSLINGUAL EMBEDDING ALIGNMENT

Abstract

Due to widespread interest in machine translation and transfer learning, there are numerous algorithms for mapping multiple embeddings to a shared representation space. Recently, these algorithms have been studied in the setting of bilingual lexicon induction where one seeks to align the embeddings of a source and a target language such that translated word pairs lie close to one another in a common representation space. In this paper, we propose a method, Filtered Inner Product Projection (FIPP), for mapping embeddings to a common representation space. As semantic shifts are pervasive across languages and domains, FIPP first identifies the common geometric structure in both embeddings and then, only on the common structure, aligns the Gram matrices of these embeddings. FIPP aligns embeddings to isomorphic vector spaces even when the source and target embeddings are of differing dimensionalities. Additionally, FIPP provides computational benefits in ease of implementation and is faster to compute than current approaches. Following the baselines in Glavaš et al. (2019), we evaluate FIPP in the context of bilingual lexicon induction and downstream language tasks. We show that FIPP outperforms existing methods on the XLING (5K) BLI dataset and the XLING (1K) BLI dataset, when using a self-learning approach, while also providing robust performance across downstream tasks.

1. INTRODUCTION

The problem of aligning sets of embeddings, or high dimensional real valued vectors, is of great interest in natural language processing, with applications in machine translation and transfer learning, and shares connections to graph matching and assignment problems (Grave et al., 2019; Gold & Rangarajan, 1996) . Aligning embeddings trained on corpora from different languages has led to improved performance of supervised and unsupervised word and sentence translation (Zou et al., 2013) , sequence labeling (Zhang et al., 2016; Mayhew et al., 2017) , and information retrieval (Vulić & Moens, 2015) . Additionally, linguistic patterns have been studied using embedding alignment algorithms (Schlechtweg et al., 2019; Lauscher & Glavaš, 2019) . Embedding alignments have also been shown to improve the performance of multilingual contextual representation models (i.e. mBERT), when used during intialization, on certain tasks such as multilingual document classification (Artetxe et al., 2020) Recently, algorithms using embedding alignments on the input token representations of contextual embedding models have been shown to provide efficient domain adaptation (Poerner et al., 2020) . Lastly, aligned source and target input embeddings have been shown to improve the transferability of models learned on a source domain to a target domain (Artetxe et al., 2018a; Wang et al., 2018; Mogadala & Rettinger, 2016) . In the bilingual lexicon induction task, one seeks to learn a transformation on the embeddings of a source and a target language so that translated word pairs lie close to one another in the shared representation space. Specifically, one is given a small seed dictionary D containing c pairs of translated words, and embeddings for these word pairs in a source and a target language, X s ∈ R c×d 1

