BOX-TO-BOX TRANSFORMATIONS FOR MODELING JOINT HIERARCHIES

Abstract

appropriate geometry to capture tree-like structures. Box embeddings (Vilnis et al., 2018; Li et al., 2019; Dasgupta et al., 2020), which represent concepts as n-dimensional hyperrectangles, are capable of embedding trees when training on a subset of the transitive closure. In Patel et al. ( 2020), the authors demonstrate that only the transitive reduction is required, and further extend box embeddings to capture joint hierarchies by augmenting the graph with new nodes. While it is possible to represent joint hierarchies with this method, the parameters for each hierarchy are decoupled, making generalization between hierarchies infeasible. In this work, we introduce a learned box-to-box transformation which respects the geometric structure of box embeddings. We demonstrate that this not only improves the capability of modeling cross-hierarchy compositional edges, but is also capable of generalizing from a subset of the transitive reduction.

1. INTRODUCTION

Representation learning for hierarchical relations is crucial in natural language processing because of the hierarchical nature of common knowledge, for example, <Bird ISA Animal> (Athiwaratkun & Wilson, 2018; Vendrov et al., 2016; Vilnis et al., 2018; Nickel & Kiela, 2017) . The ISA relation represents meaningful hierarchical relationships between concepts and plays an essential role in generalization for other relations, such as the generalization of <organ PARTOF person> based on <eye PARTOF of person>, and <organ ISA eye>. The fundamental nature of the ISA relation means that it is inherently involved in a large amount of compositional human reasoning involving other relations. Modeling hierarchies is essentially the problem of modeling a poset, or partially ordered set. The task of partial order completion, a general term to describe tasks which require learning a transitive relation, was introduced in (Vendrov et al., 2016) . The authors also introduce a model based on the reverse product order on R n , which essentially models concepts as infinite cones. Region-based representations have been effective in representing hierarchical data, as containment between regions is naturally transitive. Vilnis et al. ( 2018) introduced axis-aligned hyperrectangles (or boxes) that are provably more flexible than cones, and demonstrated state-of-the-art performance in multiple tasks. Thus far, not as much effort has been put into modeling joint hierarchies. Patel et al. ( 2020) proposed to simultaneously model ISA and HASPART hierarchies from Wordnet (Miller, 1995) . To do so, however, they effectively augmented the graph by duplicating the nodes to create a single massive hierarchy. Their model assigns two boxes B ISA and B HASPART for each node n, which are unrelated, and therefore misses out on a large amount of semantic relatedness between ISA and HASPART . In this paper we propose a box-to-box transformation which translates and dilates box representations between hierarchies. Our proposed model shares information between the ISA and HASPART hierarchies via this transformation as well as cross-hierarchy containment training objectives. We compare BOX-TRANSFORM MODEL with multiple strong baselines under different settings. We substantially outperform the prior TWO-BOX MODEL while training with only the transitive reduction of both hierarchies and predicting inferred composition edges. As mentioned above, our model's shared learned features should allow for more generalization, and we test this by training on a subset of the transitive reduction, where we find we are able to outperform strong baselines. Finally, we

