BOX-TO-BOX TRANSFORMATIONS FOR MODELING JOINT HIERARCHIES

Abstract

appropriate geometry to capture tree-like structures. Box embeddings (Vilnis et al., 2018; Li et al., 2019; Dasgupta et al., 2020), which represent concepts as n-dimensional hyperrectangles, are capable of embedding trees when training on a subset of the transitive closure. In Patel et al. ( 2020), the authors demonstrate that only the transitive reduction is required, and further extend box embeddings to capture joint hierarchies by augmenting the graph with new nodes. While it is possible to represent joint hierarchies with this method, the parameters for each hierarchy are decoupled, making generalization between hierarchies infeasible. In this work, we introduce a learned box-to-box transformation which respects the geometric structure of box embeddings. We demonstrate that this not only improves the capability of modeling cross-hierarchy compositional edges, but is also capable of generalizing from a subset of the transitive reduction.

1. INTRODUCTION

Representation learning for hierarchical relations is crucial in natural language processing because of the hierarchical nature of common knowledge, for example, <Bird ISA Animal> (Athiwaratkun & Wilson, 2018; Vendrov et al., 2016; Vilnis et al., 2018; Nickel & Kiela, 2017) . The ISA relation represents meaningful hierarchical relationships between concepts and plays an essential role in generalization for other relations, such as the generalization of <organ PARTOF person> based on <eye PARTOF of person>, and <organ ISA eye>. The fundamental nature of the ISA relation means that it is inherently involved in a large amount of compositional human reasoning involving other relations. Modeling hierarchies is essentially the problem of modeling a poset, or partially ordered set. The task of partial order completion, a general term to describe tasks which require learning a transitive relation, was introduced in (Vendrov et al., 2016) . The authors also introduce a model based on the reverse product order on R n , which essentially models concepts as infinite cones. Region-based representations have been effective in representing hierarchical data, as containment between regions is naturally transitive. Vilnis et al. (2018) introduced axis-aligned hyperrectangles (or boxes) that are provably more flexible than cones, and demonstrated state-of-the-art performance in multiple tasks. Thus far, not as much effort has been put into modeling joint hierarchies. Patel et al. (2020) proposed to simultaneously model ISA and HASPART hierarchies from Wordnet (Miller, 1995) . To do so, however, they effectively augmented the graph by duplicating the nodes to create a single massive hierarchy. Their model assigns two boxes B ISA and B HASPART for each node n, which are unrelated, and therefore misses out on a large amount of semantic relatedness between ISA and HASPART . In this paper we propose a box-to-box transformation which translates and dilates box representations between hierarchies. Our proposed model shares information between the ISA and HASPART hierarchies via this transformation as well as cross-hierarchy containment training objectives. We compare BOX-TRANSFORM MODEL with multiple strong baselines under different settings. We substantially outperform the prior TWO-BOX MODEL while training with only the transitive reduction of both hierarchies and predicting inferred composition edges. As mentioned above, our model's shared learned features should allow for more generalization, and we test this by training on a subset of the transitive reduction, where we find we are able to outperform strong baselines. Finally, we perform a detailed analysis of the model's capacity to predict compositional edges and transitive closure edges, both from an overfitting and generalization standpoint, identifying subsets where further improvement is needed.

2. RELATED WORK

Recent advances in representing one single hierarchy mainly fall in two categories: 1) representing hierarchies in non-Euclidian space (eg. hyperbolic space, due to the curvature's inductive bias to model tree-like structures) 2) using region-based representations instead of vectors for each node in the hierarchy (Erk, 2009) . Hyperbolic space has been shown to be efficient in representing hierarchical relations, but also encounters difficulties in training (Nickel & Kiela, 2017; Ganea et al., 2018b; Chamberlain et al., 2017) . Categorization models in psychology often represent a concept as a region (Nosofsky, 1986; Smith et al., 1988; Hampton, 1991) . Vilnis & McCallum (2015) and Athiwaratkun & Wilson (2018) use Gaussian distributions to embed each word in the corpus, the latter of which uses thresholded divergences which amount to region representations. Vendrov et al. ( 2016) and Lai & Hockenmaier (2017) make use of the reverse product order on R n + , which effectively results in cone representations. Vilnis et al. (2018) further extend this cone representation to axis-aligned hyperrectangles (or boxes), and demonstrate state-of-the-art performance on modeling hierarchies. Various training improvement methods for box embeddings have been proposed (Li et al., 2019; Dasgupta et al., 2020) , the most recent of which is termed GumbelBox after it's use of a latent noise model where box parameters are represented via Gumbel distributions. Region representations are also used for tasks which do not require modeling hierarchy. In Vilnis et al. (2018) , the authors also model conditional probability distributions using box embeddings. Abboud et al. ( 2020) and Ren et al. ( 2020) take a different approach, using boxes for their capacity to contain many vectors to provide slack in the loss function when modeling knowledge base triples or representing logical queries, respectively. Ren et al. ( 2020) also made use of an action on boxes similar to ours, involving translation and dilation, however our work differs in both the task (i.e. representing logical queries vs. joint hierarchies) and approach, as their model represents entities using vectors and a loss function based on a box-to-vector distance. The inductive bias of hyperbolic space is also exploited to model multiple relations, Ganea et al. (2018a) learn hyperbolic transformations for multiple relations using Poincare embeddings, and show model improvement in low computational resource settings. Patel et al. (2020) , which our work is most similar to, represent joint hierarchies using box embeddings. However, they represent each concept with two boxes ignoring the internal semantics of the concepts. Modeling joint hierarchies shares some similarities with knowledge base completion, however the goals of the two settings are different. When modeling joint hierarchies you are attempting to learn simultaneous transitive relations, and potentially learn relevant compositional edges involving these relations. For knowledge base completion, on the other hand, you may be learning many different relations, and primarily seek to recover edges which were removed rather than inferring new compositional edges. Still, the models which perform knowledge base completion can be applied to this task, as the data can be viewed as knowledge base triples with only 2 relations. There have been multiple works that aim to build better knowledge representation (Bordes et al., 2013; Trouillon et al., 2016; Sun et al., 2019; Balazevic et al., 2019a) 

3.1. BOX LATTICE MODEL

Introduced in Vilnis et al. (2018) , a box lattice model (or box model) is a geometric embedding which captures partial orders and lattice structure using n-dimensional hyperrectangles. Formally, we define the set of boxes B in R n as B(R n ) = {[x 1 , x 1 ] × • • • × [x d , x d ]},



. Most relevant, Chami et al. (2020); Balazevic et al. (2019b) recently proposed KG embedding methods which embeds entities in the Poincaré ball model of hyperbolic space. These models are intended to capture relational patterns present in multi-relational graphs, with a particular emphasis on hierarchical relations.

