ON THE MAPPING BETWEEN HOPFIELD NETWORKS AND RESTRICTED BOLTZMANN MACHINES

Abstract

Hopfield networks (HNs) and Restricted Boltzmann Machines (RBMs) are two important models at the interface of statistical physics, machine learning, and neuroscience. Recently, there has been interest in the relationship between HNs and RBMs, due to their similarity under the statistical mechanics formalism. An exact mapping between HNs and RBMs has been previously noted for the special case of orthogonal ("uncorrelated") encoded patterns. We present here an exact mapping in the case of correlated pattern HNs, which are more broadly applicable to existing datasets. Specifically, we show that any HN with N binary variables and p < N arbitrary binary patterns can be transformed into an RBM with N binary visible variables and p gaussian hidden variables. We outline the conditions under which the reverse mapping exists, and conduct experiments on the MNIST dataset which suggest the mapping provides a useful initialization to the RBM weights. We discuss extensions, the potential importance of this correspondence for the training of RBMs, and for understanding the performance of deep architectures which utilize RBMs.

1. INTRODUCTION

Hopfield networks (HNs) (Hopfield, 1982; Amit, 1989) are a classical neural network architecture that can store prescribed patterns as fixed-point attractors of a dynamical system. In their standard formulation with binary valued units, HNs can be regarded as spin glasses with pairwise interactions J ij that are fully determined by the patterns to be encoded. HNs have been extensively studied in the statistical mechanics literature (e.g. (Kanter & Sompolinsky, 1987; Amit et al., 1985) ), where they can be seen as an interpolation between the ferromagnetic Ising model (p = 1 pattern) and the Sherrington-Kirkpatrick spin glass model (many random patterns) (Kirkpatrick & Sherrington, 1978; Barra & Guerra, 2008) . By encoding patterns as dynamical attractors which are robust to perturbations, HNs provide an elegant solution to pattern recognition and classification tasks. They are considered the prototypical attractor neural network, and are the historical precursor to modern recurrent neural networks. Concurrently, spin glasses have been used extensively in the historical machine learning literature where they comprise a sub-class of "Boltzmann machines" (BMs) (Ackley et al., 1985) . Given a collection of data samples drawn from a data distribution, one is generally interested in "training" a BM by tuning its weights J ij such that its equilibrium distribution can reproduce the data distribution as closely as possible (Hinton, 2012) . The resulting optimization problem is dramatically simplified when the network has a two-layer structure where each layer has no self-interactions, so that there are only inter-layer connections (Hinton, 2012) (see Fig. 1 ). This architecture is known as a Restricted Boltzmann Machine (RBM), and the two layers are sometimes called the visible layer and the hidden layer. The visible layer characteristics (dimension, type of units) are determined by the training data, whereas the hidden layer can have binary or continuous units and the dimension is chosen somewhat arbitrarily. In addition to generative modelling, RBMs and their multi-layer extensions have been used for a variety of learning tasks, such as classification, feature extraction, and dimension reduction (e.g. Salakhutdinov et al. (2007); Hinton & Salakhutdinov (2006) ). 1

