NECESSARY AND SUFFICIENT CONDITIONS FOR COMPOSITIONAL REPRESENTATIONS

Abstract

Humans leverage compositionality for flexible and efficient learning, but current machine learning algorithms lack such ability. Despite many efforts in specific cases, there is still absence of theories and tools to study it systematically. In this paper, we leverage group theory to mathematically prove necessary and sufficient conditions for two fundamental questions of compositional representations. ( 1) What are the properties for a set of components to be expressed compositionally. (2) What are the properties for mappings between compositional and entangled representations. We provide examples to better understand the conditions and how to apply them. E.g., we use the theory to give a new explanation of why attention mechanism helps compositionality. We hope this work will help to advance understanding of compositionality and improvement of artificial intelligence towards human level.

1. INTRODUCTION

Humans recognize the world and create imaginations in a supple way by leveraging systematic compositionality to achieve compositional generalization, the algebraic capacity to understand and produce large amount of novel combinations from known components (Chomsky, 1957; Montague, 1970) . This is a key element of human intelligence (Minsky, 1986; Lake et al., 2017) , and we hope to equip machines with such ability. Conventional machine learning has been mainly developed with an assumption that training and test distributions are identical. Compositional generalization, however, is a type of out-of-distribution generalization (Bengio, 2017) which has different training and test distributions. In compositional generalization, a sample is a combination of several components. For example, an image object may have two factor components of color and rotation. In language, a sentence is composed of the lexical meanings and the grammatical structure. The generalization is enabled by recombining seen components for an unseen combination during inference. One approach for compositional generalization is to learn compositional representationsfoot_0 , or disentangled representation (Bengio, 2013) , which contain several component representations. Each of them depends only on the corresponding underlying factor, and does not change when other factors change. Please see Section 3 for details. Multiple methods have been proposed to learn compositional representations. However, little discussion has been made for some fundamental questions. What kind of factor combinations can be expressed in compositional representation? Though there are some common factor components such as colors and size, what property enable them? When a set of components satisfy the conditions, what kind of mappings are available between the entangled and compositional representations? Can we use the conditions to explain compositionality in conventional models such as attention? In this paper, we mathematically prove two propositions (Proposition 1.1 and Proposition 1.2) for necessary and sufficient conditions regarding compositional representations. We construct groups for changes on representations, and relate compositional representation with group direct product, and compositional mapping with group action equivalence (Higgins et al., 2018) . Then, we use theorems and propositions in group theory to prove the conditions. Proposition 1.1 (Compositional representation). A set of components can be expressed compositionally if and only if the subgroup product equals to the original group, each component subgroup is normal subgroup of the original group, and the group elements intersect only at identity element. Proposition 1.2 (Compositional mapping). Given compositional representation, a mapping is compositional if and only if each component has equivalent action in compositional and entangled representations, and for each element of the entangled representation, the orbits intersect only at the element. Please see Proposition 4.2 and Proposition 4.10 for symbolic statements. We also provide examples to better understand the conditions and how to use them (Section 5). For representations, we see that whether the components can be expressed with compositional representation does not depend only on each component itself, but also on their combination, and the possible values to take. We use the condition for compositional mapping to explain some existing neural network models and tasks, e.g., attention mechanism, spacial transformer and grammar tree nodes. We hope, with these examples, the conditions will be used for validating different compositional representations and mappings, and guiding designs of tasks and algorithms with compositionality. Our contributions can be summarized as follows. • We propose and prove necessary and sufficient conditions for compositional representation and compositional mapping. • We provide examples to understand and use the conditions, such as new explanation of attention models.

2. RELATED WORK

Human-level compositional learning (Marcus, 2003; Lake & Baroni, 2018) has been an important open challenge (Yang et al., 2019; Keysers et al., 2020) . There are recent progress on measuring compositionality (Andreas, 2019; Lake & Baroni, 2018; Keysers et al., 2020) and learning language compositionality for compositional generalization (Lake, 2019; Russin et al., 2019; Li et al., 2019; Gordon et al., 2020; Liu et al., 2020) and continual learning (Jin et al., 2020; Li et al., 2020) . Another line of related but different work is statistically and marginally independent disentangled representation learning (Burgess et al., 2018; Locatello et al., 2019) . This setting assumes marginal independence between underlying factors hence does not have compositional generalization problem. On the other hand, compositional factors may not be marginally independent. Understanding of compositionality has been discussed over time. Some discussions following Montague (1970) uses homomorphism to define composition operation between representations. Recently, Higgins et al. (2018) proposes definition of disentangled representation with group theory. The definition is the base of this paper, and we focus on proving the conditions. Li et al. ( 2019) defines compositionality probabilistically without discussing conditions to achieve it. Gordon et al. (2020) finds compositionality in SCAN task can be expressed as permutation group action equivalence. This equivalent action is on a component subgroup, but it does not discuss equivalent action on the whole group and the relations between them. There are also other works related to group theory in machine learning (Kondor, 2008; Cohen & Welling, 2016; Ravanbakhsh et al., 2017; Kondor & Trivedi, 2018) . However, the previous works do not prove conditions for compositional representation or mapping. In this paper, we provide and theoretically prove necessary and sufficient conditions for compositional representations and compositional mappings. We use definitions, propositions and theorems from group theory. Please refer to Appendix A. Some of them are summarized in books, such as



The word "representation" in this paper refers to variables, not group representation. Dummit & Foote (2004) and Gallian (2012), and we refer to them in the later sections.3 REPRESENTATIONSIn this section, we introduce the definitions of representation and compositional representation used in this paper.

