LARGE ASSOCIATIVE MEMORY PROBLEM IN NEURO-BIOLOGY AND MACHINE LEARNING

Abstract

Dense Associative Memories or modern Hopfield networks permit storage and reliable retrieval of an exponentially large (in the dimension of feature space) number of memories. At the same time, their naive implementation is non-biological, since it seemingly requires the existence of many-body synaptic junctions between the neurons. We show that these models are effective descriptions of a more microscopic (written in terms of biological degrees of freedom) theory that has additional (hidden) neurons and only requires two-body interactions between them. For this reason our proposed microscopic theory is a valid model of large associative memory with a degree of biological plausibility. The dynamics of our network and its reduced dimensional equivalent both minimize energy (Lyapunov) functions. When certain dynamical variables (hidden neurons) are integrated out from our microscopic theory, one can recover many of the models that were previously discussed in the literature, e.g. the model presented in "Hopfield Networks is All You Need" paper. We also provide an alternative derivation of the energy function and the update rule proposed in the aforementioned paper and clarify the relationships between various models of this class.

1. INTRODUCTION

Associative memory is defined in psychology as the ability to remember (link) many sets, called memories, of unrelated items. Prompted by a large enough subset of items taken from one memory, an animal or computer with an associative memory can retrieve the rest of the items belonging to that memory. The diverse human cognitive abilities which involve making appropriate responses to stimulus patterns can often be understood as the operation of an associative memory, with the "memories" often being distillations and consolidations of multiple experiences rather than merely corresponding to a single event. The intuitive idea of associative memory can be described using a "feature space". In a mathematical model abstracted from neurobiology, the presence (or absence) of each particular feature i is denoted by the activity (or lack of activity) of a model neuron v i due to being directly driven by a feature signal. If there are N f possible features, there can be only at most N 2 f distinct connections (synapses) in a neural circuit involving only these neurons. Typical cortical synapses are not highly reliable, and can store only a few bits of information 1 . The description of a particular memory requires roughly N f bits of information. Such a system can therefore store at most ∼ N f unrelated memories. Artificial neural network models of associative memory (based on attractor dynamics of feature neurons and understood through an energy function) exhibit this limitation even with precise synapses, with limits of memory storage to less than ∼ 0.14N f memories (Hopfield, 1982) . 1 For instance, a recent study (Bromer et al., 2018) reports the information content of individual synapses ranging between 2.7 and 4.7 bits, based on electron microscopy imaging, see also (Bartol Jr et al., 2015) . These numbers refer to the structural accuracy of synapses. There is also electrical and chemical noise in synaptic currents induced by the biophysical details of vesicle release and neurotransmitter binding. The unreliability of the fusion of pre-synaptic vesicles (containing neurotransmitter) with the pre-synaptic neuron membrane is the dominant source of trial-to-trial synaptic current variation (Allen & Stevens, 1994) . This noise decreases the electrical information capacity of individual synapses from the maximal value that the synaptic structure would otherwise provide. E = X i,j T ij i j < l a t e x i t s h a 1 _ b a s e 6 4 = " Z n y / + m w 5 Situations arise in which the number N f is small and the desired number of memories far exceeds ∼ N f , see some examples from biological and AI systems in Section 4. In these situations the associative memory model of (Hopfield, 1982) would be insufficient, since it would not be able to memorize the required number of patterns. At the same time, models of associative memory with large storage capacity considered in our paper, can easily solve these problems. W B + z Y W v x H 1 u k z o h J z 9 s = " > A A A C G H i c b Z B N S 8 M w G M f T + T b n W 9 W j l + A Q P O h s p 6 A X Y S i C x w l 7 g 7 W U N E u 3 b E l b k l Q Y p R / D i 1 / F i w d F v O 7 m t z H b e t D p A y E / / v / n I X n + f s y o V J b 1 Z R S W l l d W 1 4 r r p Y 3 N r e 0 d c 3 e v J a N E Y N L E E Y t E x 0 e S M B q S p q K K k U 4 s C O I + I 2 1 / d D v 1 2 4 9 E S B q F D T W O i c t R P 6 Q B x Q o U Z k r J r W 7 F y U y Q U x Y x k J S e R J E Z 4 h P q k q z F E n E g 3 n S 2 W w S O t 9 G A Q C X 1 C B W f q z 4 k U c S n H 3 N e d H K m B X P S m 4 n 9 e N 1 H B l Z v S M E 4 U C f H 8 o S B h U E V w m h L s U U G w Y m M N C A u q / w r x A A m E l c 6 y p E O w F 1 f + C 6 1 q x T 6 v V B E = X µ F X i ⇠ µi i = X i,j,k T ijk i j k < l a t e x i t s h a 1 _ b a s e 6 4 = " L 6 g B c D x J b H h 3 9 Q t Q 1 0 P 9 y A n Y I Z A = " > A A A C Z H i c b Z F d S 8 M w G I X T + j X n V 6 d 4 J U h w C A o 6 W h X 0 R h B F 8 V L B q b C O k m b Z 9 q 5 J W 5 J U H K V / 0 j s v v f F 3 m M 4 N 5 s c L g Y d z z k v a k z D l o L T r v l v 2 z O z c / E J l s b q 0 v L K 6 5 t T W H 1 W S S c q a N O G J f A 6 J Y h x i 1 t S g O X t O J S M i 5 O w p j K 5 K / + m F S Q V J / K C H K W s L 0 o u h C 5 R o I w V O f o 3 P 8 a G v M u F z E K B V 4 I s M 3 / g h 9 P a m V c D + K w R 5 a U K B f Q U 9 Q Q I o Y / v 4 / B B P R 3 M 4 G B x E B X 4 w N I i K S X a y N J h A F D h 1 t + G O B v 8 F b w x 1 N J 6 7 w H n z O w n N B I s 1 5 U S p l u e m u p 0 T q Y F y V l T 9 T L G U 0 I j 0 W M t g T A R T 7 X x U U o F 3 j d L B 3 U S a E 2 s 8 U q c 3 c i K U G o r Q J A X R f f X b K 8 X / v = " > A A A B 7 3 i c b V B N S w M x E J 3 U r 1 q / q h 6 9 B I v g q e x W Q Y 9 F L x 4 r 2 A 9 o l 5 J N s 2 1 o k l 2 T r F C W / g k v H h T x 6 t / x 5 r 8 x b f e g r Q 8 G H u / N M D M v T A Q 3 1 v O + U W F t f W N z q 7 h d 2 t n d 2 z 8 o H x 6 1 T J x q y p o 0 F r H u h M Q w w R V r W m 4 F 6 y S a E R k K 1 g 7 H t z O / / c S 0 4 b F 6 s J O E B Z I M F Y 8 4 J d Z J n Z 7 h Q 0 n 6 f r 9 c 8 a r e H H i V + D m p Q I 5 G v / z V G 8 Q 0 l U x Z K o g x X d 9 L b J A R b T k V b F r q p Y Y l h I 7 J k H U d V U Q y E 2 T z e 6 f 4 z C k D H M X a l b J 4 r v 6 e y I g 0 Z i J D 1 y m J H Z l l b y b + 5 3 V T G 1 0 H G V d J a p m i i 0 V R K r C N 8 e x 5 P O C a U S s m j h C q u b s V 0 x H R h F o X U c m F 4 C + / v E p a t a p / U a 3 d X 1 b q N 3 k c R T i B U z g H H 6 6 g D n f Q g C Z Q E P A M r / C G H t E L e k c f i 9 Y C y m e O 4 Q / Q 5 w / H E 4 / L < / l a t e x i t > 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 D L a 1 z g X n t 1 V j o V M l r 2 4 f + N c y U Q = " > A A A B 7 3 i c b V B N S w M x E J 3 U r 1 q / q h 6 9 B I v g q e x W Q Y 9 F L x 4 r 2 A 9 o l 5 J N s 2 1 o k l 2 T r F C W / g k v H h T x 6 t / x 5 r 8 x b f e g r Q 8 G H u / N M D M v T A Q 3 1 v O + U W F t f W N z q 7 h d 2 t n d 2 z 8 o H x 6 1 T J x q y p o 0 F r H u h M Q w w R V r W m 4 F 6 y S a E R k K 1 g 7 H t z O / / c S 0 4 b F 6 s J O E B Z I M F Y 8 4 J d Z J n Z 7 h Q 0 n 6 t X 6 5 4 l W 9 O f A q 8 X N S g R y N f v m r N 4 h p K p m y V B B j u r 6 X 2 C A j 2 n I q 2 L T U S w 1 L C B 2 T I e s 6 q o h k J s j m 9 0 7 x m V M G O I q 1 K 2 X x X P 0 9 k R F p z E S G r l M S O z L L 3 k z 8 z + u m N r o O M q 6 S 1 D J F F 4 u i V G A b 4 9 n z e M A 1 o 1 Z M H C F U c 3 c r p i O i C b U u o p I L w V 9 + e Z W g i Q U N R V U 3 3 V 1 R w p m x v v / t r a 1 v b G 5 t F 3 a K u 3 v 7 B 4 e l o + O W U a k m t E k U V 7 o T Y U M 5 k 7 R p m e W 0 k 2 i K R c R p O x r f z v z 2 E 9 W G K f l g J w k N B R 5 K F j O C r Z M 6 P c O G A v d r / V L Z r / h z o F U S 5 K Q M O R r 9 0 l d v o E g q q L S E Y 2 O 6 g Z / Y M M P a M s L p t N h L D U 0 w G e M h 7 T o q s a A m z O b 3 T t G 5 U w Y o V t q V t G i u / p 7 I s D B m I i L X K b A d m W V v J v 7 n d V M b X 4 c Z k 0 l q q S S L R X H K k V V o 9 j w a M E 2 J 5 R N H M N H M 3 Y r I C G t M r I u o 6 E I I l l 9 e J a 1 q J a h V q v e X 5 f p N H k c B T u E M L i C A K 6 j D H T S g C Q Q 4 P M M r v G H u / N M D M v T A Q 3 1 v O + U W F t f W N z q 7 h d 2 t n d 2 z 8 o H x 6 1 T J x q y p o 0 F r H u h M Q w w R V r W m 4 F 6 y S a E R k K 1 g 7 H t z O / / c S 0 4 b F 6 s J O E B Z I M F Y 8 4 J d Z J n Z 7 h Q 0 n 6 f r 9 c 8 a r e H H i V + D m p Q I 5 G v / z V G 8 Q 0 l U x Z K o g x X d 9 L b J A R b T k V b F r q p Y Y l h I 7 J k H U d V U Q y E 2 T z e 6 f 4 z C k D H M X a l b J 4 r v 6 e y I g 0 Z i J D 1 y m J H Z l l b y b + 5 3 V T G 1 0 H G V d J a p m i i 0 V R K r C N 8 e x 5 P O C a U S s m j h C q u b s V 0 x H R h F o X U c m F 4 C + / v E p a t a p / U a 3 d X 1 b q N 3 k c R T i B U z g H H 6 6 g D n f Q g C Z Q E P A M r / C G H t E L e k c f i 9 Y C y m e O 4 Q / Q 5 w / H E 4 / L < / l a t e x i t > 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 D L a 1 z g X n t 1 V j o V M l r 2 4 f + N c y U Q = " > A A A B 7 3 i c b V B N S w M x E J 3 U r 1 q / q h 6 9 B I v g q e x W Q Y 9 F L x 4 r 2 A 9 o l 5 J N s 2 1 o k l 2 T r F C W / g k v H h T x 6 t / x 5 r 8 x b f e g r Q 8 G H u / N M D M v T A Q 3 1 v O + U W F t f W N z q 7 h d 2 t n d 2 z 8 o H x 6 1 T J x q y p o 0 F r H u h M Q w w R V r W m 4 F 6 y S a E R k K 1 g 7 H t z O / / c S 0 4 b F 6 s J O E B Z I M F Y 8 4 J d Z J n Z 7 h Q 0 n 6 t X 6 5 4 l W 9 O f A q 8 X N S g R y N f v m r N 4 h p K p m y V B B j u r 6 X 2 C A j 2 n I q 2 L T U S w 1 L C B 2 T I e s 6 q o h k J s j m 9 0 7 x m V M G O I q 1 K 2 X x X P 0 9 k R F p z E S G g i Q U N R V U 3 3 V 1 R w p m x v v / t r a 1 v b G 5 t F 3 a K u 3 v 7 B 4 e l o + O W U a k m t E k U V 7 o T Y U M 5 k 7 R p m e W 0 k 2 i K R c R p O x r f z v z 2 E 9 W G K f l g J w k N B R 5 K F j O C r Z M 6 P c O G A v d r / V L Z r / h z o F U S 5 K Q M O R r 9 0 l d v o E g q q L S E Y 2 O 6 g Z / Y M M P a M s L p t N h L D U 0 w G e M h 7 T o q s a A m z O b 3 T t G 5 U w Y o V t q V t G i u / p 7 I s D B m I i L X K b A d m W V v J v 7 n d V M b X 4 c Z k 0 l q q S S L R X H K k V V o h N V m k n R M t O E B j E e C R Y x g o 2 V / N Y g 8 + q z Q b n i V t 0 F 0 D r x c l K B H M 1 B + a s / l C S N q T C E Y 6 1 7 n p u Y I M P K M M L p r N R P N U 0 w m e A R 7 V k q c E x 1 k C 2 u n a E L q w x R J J U t Y d B C / T 2 R 4 V j r a R z a z h i b s V 7 1 5 u J / X i 8 1 0 U 2 Q M Z G k h g q y X B S l H B m J 5 q + j I V O U G D 6 1 B B P F 7 K 2 I j L H C x N i A S j Y E b / X l d e L X q l 6 9 W n u 4 q j R u 8 z i K c A b n c A k e X E M E 3 J Y 1 Z c o + 5 a d u V z d 5 f M M F z L g I = " > A A A B 7 X i c b V B N T w I x E J 3 F L 8 Q v 1 K O X R m L i i e y C i R 6 J X j x i A g s J b E i 3 d K H S b T d t 1 4 R s + A 9 e P G i M V / + P N / + N B f a g 4 E s m e X l v J j P z w o Q z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e M T X 8 t U E d o m k k v V D b G m n A n a N s x w 2 k 0 U x X H I a S e c 3 M 3 9 z h N V m k n R M t O E B j E e C R Y x g o 2 V / N Y g q 9 V n g 3 L F r b o L o H X i 5 a Q C O Z q D 8 l d / K E k a U 2 E I x 1 r 3 P D c x Q Y a V Y Y T T W a m f a p p g M s E j 2 r N U 4 J j q I F t c O 0 M X V h m i S C p b w q C F + n s i w 7 H W 0 z i 0 n T E 2 Y 7 3 q z c X / v F 5 q o p s g Y y J J D R V k u S h K O T I S z V 9 H Q 6 Y o M X x q C S a K 2 V s R G W O F i b E B l W w I 3 u r L 6 8 S v V b 1 6 t f Z w V W n c 5 n E U 4 Q z O 4 R I 8 u I Y G 3 E M T 2 k D g E Z 7 h F d The starting point of this paper is a machine learning approach to associative memory based on an energy function and attractor dynamics in the space of N f variables, called Dense Associative Memory (Krotov & Hopfield, 2016) . This idea has been shown to dramatically increase the memory storage capacity of the corresponding neural network (Krotov & Hopfield, 2016; Demircigil et al., 2017) and was proposed to be useful for increasing robustness of neural networks to adversarial attacks (Krotov & Hopfield, 2018) . Recently, an extension of this idea to continuous variables, called modern Hopfield network, demonstrated remarkably successful results on the immune repertoire classification (Widrich et al., 2020) , and provided valuable insights into the properties of attention heads in Transformer architectures (Ramsauer et al., 2020) . Dense Associative Memories or modern Hopfield networks, however, cannot describe biological neural networks in terms of true microscopic degrees of freedom, since they contain many-body interaction terms in equations describing their dynamics and the corresponding energy functions. To illustrate this point consider two networks: a conventional Hopfield network (Hopfield, 1982) and a Dense Associative Memory with cubic interaction term in the energy function (see Fig. 1 ). In the conventional network the dynamics is encoded in the matrix T ij , which represents the strengths of the synaptic connections between feature neurons i and j. Thus, this network is manifestly describable in terms of only two-body synapses, which is approximately true for many biological synapses. In contrast, a Dense Associative Memory network with cubic energy function naively requires the synaptic connections to be tensors T ijk with three indices, which are harder, although not impossible, to implement biologically. Many-body synapses become even more problematic in situations when the interaction term is described by a more complicated function than a simple power (in this case the Taylor expansion of that function would generate a series of terms with increasing powers). Many-body synapses typically appear in situations when one starts with a microscopic theory described by only two-body synapses and integrates out some of the degrees of freedom (hidden neurons). The argument described above based on counting the information stored in synapses in conjunction with the fact that modern Hopfield nets and Dense Associative Memories can have a huge storage capacity hints at the same solution. The reason why these networks have a storage capacity much greater than N f is because they do not describe the dynamics of only N f neurons, but rather involve additional neurons and synapses. Thus, there remains a theoretical question: what does this hidden circuitry look like? Is it possible to introduce a set of hidden neurons with appropriately chosen interaction terms and activation



U h p y T P P 7 u A 1 P I W O T L j D K K d K e i k 9 G W a w o e 9 h 5 k j a 5 8 i j M I e h Z 5 a t i j U r + B f s H M o g r 7 p n T p x e h B N O

8 u y r W b P I 4 i O A C H 4 B j Y 4 B L U w D 2 o g y b A 4 A m 8 g D f w b j w b r 8 a H 8 T l v L R j 5 z D 7 4 V c b k G 9 W H n 7 I = < / l a t e x i t >

F a m u 2 f t H O I 0 0 y y m 3 x d 1 M 4 5 1 g s v G c Q c k o 5 o P D R A q w X w r p n 0 i C d X m X a q m B O / 3 L / + F x 6 O G d 9 w 4 u j + p X 1 y O 6 6 i g L b S D 9 p C H T t E F u k V 3 q I k o + r A W L M e q W Z / 2 s r 1 h b 3 5 H b W u 8 s 4 F + j L 3 9 B f o D t 9 o = < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " G C O V v U e Y e A e b + a g 4 W p S j a s X y u Z w

0 a l X / o l q 7 v 6 z U b / I 4 i n A C p 3 A O P l x B H e 6 g A U 2 g I O A Z X u E N P a I X 9 I 4 + F q 0 F l M 8 c w x + g z x / I l 4 / M < / l a t e x i t > 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " h a s e A 8 6 R S E v J u L I X x / 8 B u Y 0 L 6 r w = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h d 1 E 0 G P Q i 8 c I 5 g H J E m Y n s 8 m Q e a w z s 0 J Y 8 h N e P C j i 1 d / x 5 t 8 4 S f a

H m P 3 o v 3 7 n 0 s W t e 8 f O Y E / s D 7 / A H K G 4 / N < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " G C O V v U e Y e A e b + a g 4 W p S j a s X y u Z w = " > A A A B 7 3 i c b V B N S w M x E J 3 U r 1 q / q h 6 9 B I v g q e x W Q Y 9 F L x 4 r 2 A 9 o l 5 J N s 2 1 o k l 2 T r F C W / g k v H h T x 6 t / x 5 r 8 x b f e g r Q 8

r l M S O z L L 3 k z 8 z + u m N r o O M q 6 S 1 D J F F 4 u i V G A b 4 9 n z e M A 1 o 1 Z M H C F U c 3 c r p i O i C b U u o p I L w V 9 + e Z W 0 a l X / o l q 7 v 6 z U b / I 4 i n A C p 3 A O P l x B H e 6 g A U 2 g I O A Z X u E N P a I X 9 I 4 + F q 0 F l M 8 c w x + g z x / I l 4 / M < / l a t e x i t > 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " h a s e A 8 6 R S E v J u L I X x / 8 B u Y 0 L 6 r w = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h d 1 E 0 G P Q i 8 c I 5 g H J E m Y n s 8 m Q e a w z s 0 J Y 8 h N e P C j i 1 d / x 5 t 8 4 S f a

9 j w a M E 2 J 5 R N H M N H M 3 Y r I C G t M r I u o 6 E I I l l 9 e J a 1 q J a h V q v e X 5 f p N H k c B T u E M L i C A K 6 j D H T S g C Q Q 4 P M M r v H m P 3 o v 3 7 n 0 s W t e 8 f O Y E / s D 7 / A H K G 4 / N < / l a t e x i t > T 12 < l a t e x i t s h a 1 _ b a s e 6 4 = " Q a I o v E o j I o C e t 1 B r J M 9 I 9 P I k g y 4 = " > A A A B 7 X i c b V B N S w M x E J 3 U r 1 q / q h 6 9 B I v g q e x W Q Y9 F L x 4 r t L X Q L i W b Z t v Y b L I k W a E s / Q 9 e P C j i 1 f / j z X 9 j 2 u 5 B W x 8 M P N 6 b Y W Z e m A h u r O d 9 o 8 L a + s b m V n G 7 t L O 7 t 3 9 Q P j x q G 5 V q y l p U C a U 7 I T F M c M l a l l v B O o l m J A 4 F e w j H t z P / 4 Y l p w 5 V s 2 k n C g p g M J Y 8 4 J d Z J 7 W Y / 8 2 v T f r n i V b 0 5 8 C r x c 1 K B H I 1 + + a s 3 U D S N m b R U E G O 6 v p f Y I C P a c i r Y t N R L D U s I H Z M h 6 z o q S c x M k M 2 v n e I z p w x w p L Q r a f F c / T 2 R k d i Y S R y 6 z p j Y k V n 2 Z u J / X j e 1 0 X W Q c Z m k l k m 6 W B S l A l u F Z 6 / j A d e M W j F x h F D N 3 a 2 Y j o g m 1 L q A S i 4 E f / n l V d K u V f 2 L a u 3 + s l K / y e M o w g m c w j n 4 c A V 1 u I M G t I D C I z z D K 7 w h h V 7 Q O / p Y t B Z Q P n M M f 4 A + f w A M y I 7 I < / l a t e x i t > T 13< l a t e x i t s h a 1 _ b a s e 6 4 = " q S b d s x 6 5 8 o A P f r l o X Z q m 4 r Q GQ n 4 = " > A A A B 7 X i c b V B N T w I x E J 3 F L 8 Q v 1 K O X R m L i i e y C i R 6 J X j x i A g s J b E i 3 d KH S b T d t 1 4 R s + A 9 e P G i M V / + P N / + N B f a g 4 E s m e X l v J j P z w o Q z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e M T X 8 t U E d o m k k v V D b G m n A n a N s x w 2 k 0 U x X H I a S e c 3 M 3 9 z

D 7 q E J b S D w C M / w C m + O d F 6 c d + d j 2 V p w 8 p l T + A P n 8 w c O T Y 7 J < / l a t e x i t > T 23 < l a t e x i t s h a 1 _ b a s e 6 4 = " C +

Figure 1: Two binary networks consisting of three neurons σ 1 , σ 2 σ 3 = On the left is the classical Hopfield network(Hopfield, 1982)  with the T ij µ ξ µi ξ µj being outer product of memory vectors (see section 2 for the definitions of notations). In this case the matrix T ij is interpreted as a matrix of synaptic connections between cells i and j. On the right is a Dense Associative Memory network of (Krotov & Hopfield, 2016) with cubic interaction term F (x) = x 3 . In this case the corresponding tensor T ijk = µ ξ µi ξ µj ξ µk has three indices, thus cannot be interpreted as a biological synapse, which can only connect two cells.

