MINIMUM DESCRIPTION LENGTH RECURRENT NEURAL NETWORKS

Abstract

Recurrent neural networks (RNNs) face two well-known challenges: (a) the difficulty of such networks to generalize appropriately as opposed to memorizing, especially from very short input sequences (generalization); and (b) the difficulty for us to understand the knowledge that the network has attained (transparency). We explore the implications to these challenges of employing a general search through neural architectures using a genetic algorithm with Minimum Description Length (MDL) as an objective function. We find that MDL leads the networks to reach adequate levels of generalization from very small corpora, improving over backpropagation-based alternatives. We demonstrate this approach by evolving networks which perform tasks of increasing complexity with absolute correctness. The resulting networks are small, easily interpretable, and unlike classical RNNs, are provably appropriate for sequences of arbitrary length even when trained on very limited corpora. One case study is addition, for which our system grows a network with just four cells, reaching 100% accuracy (and at least .999 certainty) for arbitrary large numbers.

1. INTRODUCTION

The modeling of sequential knowledge and learning requires making appropriate generalizations from input sequences that are often quite short. This holds both for language capabilities and for other sequential tasks such as counting. Moreover, it is often helpful for the modeler to inspect the acquired knowledge and reason about its properties. Neural networks, despite their impressive results and popularity in a wide range of domains, still face some challenges in these respects: they tend to overfit the learning data and require regularization or other special measures, as well as very large training corpora, to avoid this problem. In terms of knowledge, networks are often very big, and it is generally very hard to inspect a given network and determine what it is that it actually knows (see Papernot & McDaniel, 2018 , among others, for a recent attempt to probe this knowledge). Some of the challenges above arise from the reliance of common connectionist approaches on backpropagation as a training method, and in this paper we explore the implications to sequential modeling of well-known alternative perspectives on neural network design. Specifically, we consider replacing backpropagation with a general search using a genetic algorithm through a large space of possible networks using Minimum Description Length (MDL; Rissanen, 1978) as an objective function. In essence, this amounts to minimizing error as usual, while at the same time trying to minimize the size of the network. We find that MDL helps the networks reach adequate levels of generalization from very small corpora, avoiding overfitting and performing significantly better than backpropagation-based alternatives. The MDL search converges on networks that are often small, transparent, and provably correct. We illustrate this across a range of sequential tasks.

2. PREVIOUS WORK

Our work follows several lines of work in the literature. Evolutionary programming has been used to evolve neural networks in a range of studies. Early work that uses genetic algorithms for various aspects of neural network optimization includes Miller et al. (1989 ), Montana & Davis (1989) , Whitley et al. (1990), and Zhang & Mühlenbein (1993; 1995) . These works focus on feed-forward architectures, but Angeline et al. (1994) present an evolutionary algorithm that discovers recurrent

