PROBING BERT IN HYPERBOLIC SPACES

Abstract

Recently, a variety of probing tasks are proposed to discover linguistic properties learned in contextualized word embeddings. Many of these works implicitly assume these embeddings lay in certain metric spaces, typically the Euclidean space. This work considers a family of geometrically special spaces, the hyperbolic spaces, that exhibit better inductive biases for hierarchical structures and may better reveal linguistic hierarchies encoded in contextualized representations. We introduce a Poincaré probe, a structural probe projecting these embeddings into a Poincaré subspace with explicitly defined hierarchies. We focus on two probing objectives: (a) dependency trees where the hierarchy is defined as headdependent structures; (b) lexical sentiments where the hierarchy is defined as the polarity of words (positivity and negativity). We argue that a key desideratum of a probe is its sensitivity to the existence of linguistic structures. We apply our probes on BERT, a typical contextualized embedding model. In a syntactic subspace, our probe better recovers tree structures than Euclidean probes, revealing the possibility that the geometry of BERT syntax may not necessarily be Euclidean. In a sentiment subspace, we reveal two possible meta-embeddings for positive and negative sentiments and show how lexically-controlled contextualization would change the geometric localization of embeddings. We demonstrate the findings with our Poincaré probe via extensive experiments and visualization 1 .

1. INTRODUCTION

Contextualized word representations with pretrained language models have significantly advanced NLP progress (Peters et al., 2018a; Devlin et al., 2019) . Previous works point out that abundant linguistic knowledge implicitly exists in these representations (Belinkov et al., 2017; Peters et al., 2018b; a; Tenney et al., 2019) . This paper is primarily inspired by Hewitt & Manning (2019) who propose a structural probe to recover dependency trees encoded under squared Euclidean distance in a syntactic subspace. Although being an implicit assumption, there is no strict evidence that the geometry of these syntactic subspaces should be Euclidean, especially under the fact that the Euclidean space has intrinsic difficulties for modeling trees (Linial et al., 1995) . We propose to impose and explore different inductive biases for modeling syntactic subspaces. The hyperbolic space, a special Riemannian space with constant negative curvature, is a good candidate because of its tree-likeness (Nickel & Kiela, 2017; Sarkar, 2011) . We adopt a generalized Poincaré Ball, a special model of hyperbolic spaces, to construct a Poincaré probe for contextualized embeddings. Figure 1 (A, B ) give an example of a tree embedded in the Poincaré ball and compare the Euclidean counterparts. Intuitively, the volume of a Poincaré ball grows exponentially with its radius, which is similar to the phenomenon that the number of nodes of a full tree grows exponentially with its depth. This would give "enough space" to embed the tree. In the meantime, the volume of the Euclidean ball grows polynomially and thus has less capacity to embed tree nodes.

