ON THE MAPPING BETWEEN HOPFIELD NETWORKS AND RESTRICTED BOLTZMANN MACHINES

Abstract

Hopfield networks (HNs) and Restricted Boltzmann Machines (RBMs) are two important models at the interface of statistical physics, machine learning, and neuroscience. Recently, there has been interest in the relationship between HNs and RBMs, due to their similarity under the statistical mechanics formalism. An exact mapping between HNs and RBMs has been previously noted for the special case of orthogonal ("uncorrelated") encoded patterns. We present here an exact mapping in the case of correlated pattern HNs, which are more broadly applicable to existing datasets. Specifically, we show that any HN with N binary variables and p < N arbitrary binary patterns can be transformed into an RBM with N binary visible variables and p gaussian hidden variables. We outline the conditions under which the reverse mapping exists, and conduct experiments on the MNIST dataset which suggest the mapping provides a useful initialization to the RBM weights. We discuss extensions, the potential importance of this correspondence for the training of RBMs, and for understanding the performance of deep architectures which utilize RBMs.

1. INTRODUCTION

Hopfield networks (HNs) (Hopfield, 1982; Amit, 1989) are a classical neural network architecture that can store prescribed patterns as fixed-point attractors of a dynamical system. In their standard formulation with binary valued units, HNs can be regarded as spin glasses with pairwise interactions J ij that are fully determined by the patterns to be encoded. HNs have been extensively studied in the statistical mechanics literature (e.g. (Kanter & Sompolinsky, 1987; Amit et al., 1985) ), where they can be seen as an interpolation between the ferromagnetic Ising model (p = 1 pattern) and the Sherrington-Kirkpatrick spin glass model (many random patterns) (Kirkpatrick & Sherrington, 1978; Barra & Guerra, 2008) . By encoding patterns as dynamical attractors which are robust to perturbations, HNs provide an elegant solution to pattern recognition and classification tasks. They are considered the prototypical attractor neural network, and are the historical precursor to modern recurrent neural networks. Concurrently, spin glasses have been used extensively in the historical machine learning literature where they comprise a sub-class of "Boltzmann machines" (BMs) (Ackley et al., 1985) . Given a collection of data samples drawn from a data distribution, one is generally interested in "training" a BM by tuning its weights J ij such that its equilibrium distribution can reproduce the data distribution as closely as possible (Hinton, 2012) . The resulting optimization problem is dramatically simplified when the network has a two-layer structure where each layer has no self-interactions, so that there are only inter-layer connections (Hinton, 2012) (see Fig. 1 ). This architecture is known as a Restricted Boltzmann Machine (RBM), and the two layers are sometimes called the visible layer and the hidden layer. The visible layer characteristics (dimension, type of units) are determined by the training data, whereas the hidden layer can have binary or continuous units and the dimension is chosen somewhat arbitrarily. In addition to generative modelling, RBMs and their multi-layer extensions have been used for a variety of learning tasks, such as classification, feature extraction, and dimension reduction (e.g. Salakhutdinov et al. (2007) ; Hinton & Salakhutdinov (2006) ). There has been extensive interest in the relationship between HNs and RBMs, as both are built on the Ising model formalism and fulfill similar roles, with the aim of better understanding RBM behaviour and potentially improving performance. Various results in this area have been recently reviewed (Marullo & Agliari, 2021) . In particular, an exact mapping between HNs and RBMs has been previously noted for the special case of uncorrelated (orthogonal) patterns (Barra et al., 2012) . Several related models have since been studied (Agliari et al., 2013; Mézard, 2017) , which partially relax the uncorrelated pattern constraint. However, the patterns observed in most real datasets exhibit significant correlations, precluding the use of these approaches. In this paper, we demonstrate exact correspondence between HNs and RBMs in the case of correlated pattern HNs. Specifically, we show that any HN with N binary units and p < N arbitrary (i.e. non-orthogonal) binary patterns encoded via the projection rule (Kanter & Sompolinsky, 1987; Personnaz et al., 1986) , can be transformed into an RBM with N binary and p gaussian variables. We then characterize when the reverse map from RBMs to HNs can be made. We consider a practical example using the mapping, and discuss the potential importance of this correspondence for the training and interpretability of RBMs.

2. RESULTS

We first introduce the classical solution to the problem of encoding N -dimensional binary {-1, +1} vectors {ξ µ } p µ=1 , termed "patterns", as global minima of a pairwise spin glass H(s) = -1 2 s T J s. This is often framed as a pattern retrieval problem, where the goal is to specify or learn J ij such that an energy-decreasing update rule for H(s) converges to the patterns (i.e. they are stable fixed points). Consider the N × p matrix ξ with the p patterns as its columns. Then the classical prescription known as the projection rule (or pseudo-inverse rule) (Kanter & Sompolinsky, 1987; Personnaz et al., 1986) , J = ξ(ξ T ξ) -1 ξ T , guarantees that the p patterns will be global minima of H(s). This resulting spin model is commonly called a (projection) Hopfield network, and has the Hamiltonian H(s) = - 1 2 s T ξ(ξ T ξ) -1 ξ T s. Note that ξ T ξ invertibility is guaranteed as long as the patterns are linearly independent (we therefore require p ≤ N ). Also note that in the special (rare) case of orthogonal patterns ξ µ • ξ ν = N δ µν (also called "uncorrelated"), studied in the previous work (Barra et al., 2012) , one has ξ T ξ = N I and so the pseudo-inverse interactions reduce to the well-known Hebbian form J = 1 N ξξ T (the properties of which are studied extensively in Amit et al. (1985) ). Additional details on the projection HN Eq. ( 1) are provided in Appendix A. To make progress in analyzing Eq. (1), we first consider a transformation of ξ which eliminates the inverse factor.

2.1. MAPPING A HOPFIELD NETWORK TO A RESTRICTED BOLTZMANN MACHINE

In order to obtain a more useful representation of the quadratic form Eq. (1) (for our purposes), we utilize the QR-decomposition (Schott & Stewart, 1999) of ξ to "orthogonalize" the patterns, ξ = QR, with Q ∈ R N ×p , R ∈ R p×p . The columns of Q are the orthogonalized patterns, and form an orthonormal basis (of non-binary vectors) for the p-dimensional subspace spanned by the binary patterns. R is upper triangular, and if its diagonals are held positive then Q and R are both unique (Schott & Stewart, 1999) . Note both the order and sign of the columns of ξ are irrelevant for HN pattern recall, so there are n = 2 p • p! possible Q, R pairs. Fixing a pattern ordering, we can use the orthogonality of Q to re-write the interaction matrix as J = ξ(ξ T ξ) -1 ξ T = QR(R T R) -1 R T Q T = QQ T (the last equality follows from (R T R) -1 = R -1 (R T ) -1 ). Eq. ( 3) resembles the simple Hebbian rule but with non-binary orthogonal patterns. Defining q ≡ Q T s in analogy to the classical pattern overlap parameter m ≡ 1 N ξ T s (Amit et al., 1985) , we have H(s) = - 1 2 s T QQ T s = - 1 2 q(s) • q(s).

