

Abstract

Information Lattice Learning (ILL) is a general framework to learn decomposed representations, called rules, of a signal such as an image or a probability distribution. Each rule is a coarsened signal used to gain some human-interpretable insight into what might govern the nature of the original signal. To summarize the signal, we need several disentangled rules arranged in a hierarchy, formalized by a lattice structure. ILL focuses on explainability and generalizability from "small data", and aims for rules akin to those humans distill from experience (rather than a representation optimized for a specific task like classification). This paper focuses on a mathematical and algorithmic presentation of ILL, then demonstrates how ILL addresses the core question "what makes X an X" or "what makes X different from Y" to create effective, rule-based explanations designed to help human learners understand. The key part here is what rather than tasks like generating X or predicting labels X,Y. Typical applications of ILL are presented for artistic and scientific knowledge discovery. These use ILL to learn music theory from scores and chemical laws from molecule data, revealing relationships between domains. We include initial benchmarks and assessments for ILL to demonstrate efficacy.

1. INTRODUCTION

With rapid progress in AI, there is an increasing desire for general AI (Goertzel & Pennachin, 2007; Chollet, 2019) and explainable AI (Adadi & Berrada, 2018; Molnar, 2019) , which exhibit broad, human-like cognitive capacities. One common pursuit is to move away from "black boxes" designed for specific tasks to achieve broad generalization through strong abstractions made from only a few examples, with neither unlimited priors nor unlimited data ("primitive priors" & "small data" instead). In this pursuit, we present a new, task-nonspecific framework-Information Lattice Learning (ILL)to learn representations akin to human-distilled rules, e.g., producing much of a standard music theory curriculum as well as new rules in a form directly interpretable by students (shown at the end). The term information lattice was first defined by Shannon (1953) , but remains largely conceptual and unexplored. In the context of abstraction and representation learning, we independently develop representation lattices that coincide with Shannon's information lattice when restricted to his context. Instead of inventing a new name, we adopt Shannon's. However, we not only generalize the original definition-an information lattice here is a hierarchical distribution of representations-but we also bring learning into the lattice, yielding the name ILL. ILL explains a signal (e.g., a probability distribution) by disentangled representations, called rules. A rule explains some but not all aspects of the signal, but together the collection of rules aims to capture a large part of the signal. ILL is specially designed to address the core question "what makes X an X" or "what makes X different from Y", emphasizing the what rather than generating X or predicting labels X,Y in order to facilitate effective, rule-based explanations designed to help human learners understand. A music AI classifying concertos, or generating one that mimics the masters, does not necessarily produce human insight about what makes a concerto a concerto or the best rules a novice composer might employ to write one. Our focus represents a shift from much representation-learning work (Bengio et al., 2013) that aim to find the best representation for solving a specific task (e.g., classification) rather than strong concern for explainability. Instead of optimizing a task-specific objective function (e.g., classification error), ILL balances more general objectives that favor fewer, simpler rules for interpretability, and more essential rules for effectiveness-all formalized later. One intuition behind ILL is to break the whole into simple pieces, similar to breaking a signal into a Fourier series. Yet, rather than decomposition via projection to orthonormal basis and synthesis 1

