NECESSARY AND SUFFICIENT CONDITIONS FOR COMPOSITIONAL REPRESENTATIONS

Abstract

Humans leverage compositionality for flexible and efficient learning, but current machine learning algorithms lack such ability. Despite many efforts in specific cases, there is still absence of theories and tools to study it systematically. In this paper, we leverage group theory to mathematically prove necessary and sufficient conditions for two fundamental questions of compositional representations. ( 1) What are the properties for a set of components to be expressed compositionally. (2) What are the properties for mappings between compositional and entangled representations. We provide examples to better understand the conditions and how to apply them. E.g., we use the theory to give a new explanation of why attention mechanism helps compositionality. We hope this work will help to advance understanding of compositionality and improvement of artificial intelligence towards human level.

1. INTRODUCTION

Humans recognize the world and create imaginations in a supple way by leveraging systematic compositionality to achieve compositional generalization, the algebraic capacity to understand and produce large amount of novel combinations from known components (Chomsky, 1957; Montague, 1970) . This is a key element of human intelligence (Minsky, 1986; Lake et al., 2017) , and we hope to equip machines with such ability. Conventional machine learning has been mainly developed with an assumption that training and test distributions are identical. Compositional generalization, however, is a type of out-of-distribution generalization (Bengio, 2017) which has different training and test distributions. In compositional generalization, a sample is a combination of several components. For example, an image object may have two factor components of color and rotation. In language, a sentence is composed of the lexical meanings and the grammatical structure. The generalization is enabled by recombining seen components for an unseen combination during inference. One approach for compositional generalization is to learn compositional representationsfoot_0 , or disentangled representation (Bengio, 2013), which contain several component representations. Each of them depends only on the corresponding underlying factor, and does not change when other factors change. Please see Section 3 for details. Multiple methods have been proposed to learn compositional representations. However, little discussion has been made for some fundamental questions. What kind of factor combinations can be expressed in compositional representation? Though there are some common factor components such as colors and size, what property enable them? When a set of components satisfy the conditions, what kind of mappings are available between the entangled and compositional representations? Can we use the conditions to explain compositionality in conventional models such as attention? In this paper, we mathematically prove two propositions (Proposition 1.1 and Proposition 1.2) for necessary and sufficient conditions regarding compositional representations. We construct groups for changes on representations, and relate compositional representation with group direct product, and compositional mapping with group action equivalence (Higgins et al., 2018) . Then, we use theorems and propositions in group theory to prove the conditions.



The word "representation" in this paper refers to variables, not group representation.1

