SYMMETRICAL SYNCMAP FOR IMBALANCED GENERAL CHUNKING PROBLEMS

Abstract

Recently, SyncMap (2021) pioneered an approach to learn complex structures from sequences as well as adapt to any changes in underlying structures. Such approach, inspired by neuron group behaviors, is achieved by using self-organizing dynamical equations without any loss functions. Here we propose Symmetrical SyncMap that goes beyond the original work to show how to create dynamical equations and attractor-repeller points which are stable over the long run, even dealing with imbalanced continual general chunking problems (CGCPs). The main idea is to apply equal updates from positive and negative feedback loops by symmetrical activation. We then introduce the concept of memory window to allow for more positive updates. Our algorithm surpasses or ties other unsupervised state-of-the-art baselines in all 12 imbalanced CGCPs with various difficulties, including dynamical ones. To verify its performance in real-world scenarios, we conduct experiments on several well-studied structure learning problems. The proposed method surpasses substantially other methods in all scenarios, suggesting that symmetrical activation plays a critical role in uncovering topological structures and even hierarchies encoded in temporal data.

1. INTRODUCTION

Human brains have been proved to have unsupervised abilities to detect repetitive patterns in sequences involving texts, sounds and images (Orbán et al., 2008; Bulf et al., 2011; Strauss et al., 2015) . In the field of neuroscience, part of this behavior is known as chunking. Chunking has been verified in many experiments to play an important role in a diverse range of cognitive functions (Schapiro et al., 2013; Yokoi & Diedrichsen, 2019; Asabuki & Fukai, 2020) . Related to chunking problems, many sequence processing algorithms in machine learning have been proposed for time-series clustering (Aghabozorgi et al., 2015) based on similarity measurements (Figure 1 (a)). Chunking sequences between state variables, however, is still underexplored (see Figure 1  (b)(c)). Recently, Vargas & Asabuki (2021) proposed the first learning of chunking based solely on selforganization called SyncMap. The authors also extended chunking problems into one called Continual General Chunking Problem (CGCP), which includes problems with diverse structures that can change dynamically throughout the experiments. For the first time, SyncMap was shown not only able to uncover complex structures from sequential data, but also to adapt to continuously changing structures. It achieves this with self-organizing dynamics that maps temporal input correlations to spacial correlations, where the dynamics are always updating with negative/positive feedback loops. In this work, however, we identify problems in the original dynamics that lead to long-term instability, and we further show that performances in imbalanced CGCPs are poor given the asymmetric number of updates, i.e., the number of negative updates is much bigger than that of the positive ones. Beyond identifying these problems, here we propose Symmetrical SyncMap, which can solve both of the problems above using symmetric selection of nodes and generalized memory window. Symmetrical SyncMap solves the instability of the dynamics efficiently, and goes beyond to propose a solution to deal with imbalanced general chunking problems. As opposed to the original SyncMap that suffers from the uneven updates from positive/negative feedback loops, we propose symmetrical activation, and further introduce the concept of memory window, so that the system can have more updates from positive feedback loop while concurrently reducing the number of negative updates. In fact, the symmetrical number of updates not only compensates when imbalanced chunks are presented, but also makes the algorithm stable over the long run and reaches an equilibrium quickly in changing environments. By showing that equilibrium and self-organization can appear only with dynamical equations and without optimization/loss functions, the biggest motivation from this paper is realizing how the substantial improvements, beyond the self-organization inspiration, make the new learning paradigm very adaptive and precise. Moreover, the simplicity of the modifications here, as supported by the effectiveness in real-world scenarios of structure learning, solves the problem at the foundation, while keeping the final method concise and improving it in both accuracy and stability.

2. RELATED WORKS

Chunking. Natural neural systems are well known for the unsupervised adaptivity, since they can self-organize by many mechanisms for several purposes on many timescales (Lukoševicius, 2012) . One of the mechanisms is chunking, which can be described as a biological process where the brain attains compact representation of sequences (Estes et al., 2007; Ramkumar et al., 2016) . Specifically, long and complex sequences are first segmented into short and simple ones, while frequently repeated segments are concatenated into single units (Asabuki & Fukai, 2020) . This can be seen as a complexity reduction for temporal information processing and associated cost (Ramkumar et al., 2016) . Albeit our focus is more on neuroscience and machine learning perspectives, earlier algorithms proposed for solving chunking problems are from linguistics and include PARSER (Perruchet & Vinter, 1998) . It performs well in detecting simple chunks, but fails when the probability of state transition are uniform (Schapiro et al., 2013) . A neuro-inspired sequence learning model, Minimization of Regularized Information Loss (MRIL) was proposed by applying a family of competitive network of two compartment neuron models that aims to predict its own output in a type of self-supervised neuron (Asabuki & Fukai, 2020) . Albeit the interesting paradigm, MRIL has been shown unstable even for problems in which it performs reasonably well. Very recently, a self-organizing learning paradigm (called SyncMap) has been proposed, which surpassed MRIL in all scenarios (Vargas & Asabuki, 2021) . Time-series Clustering. Time series data is defined as a sequence of continuous, real-valued elements, usually with high dimensionality and large data size (Aghabozorgi et al., 2015) . As a subroutine in unsupervised sequence processing, time-series clustering aims to uncover patterns, usually in very large sequential datasets that cannot be manually handled. This can be found in some articles applying competition-based self-organizing maps (SOMs) (Kohonen, 1990) and their variations (Vannucci & Colla, 2018; Fortuin et al., 2018) , which are well-suited for clustering time series but not capable of chunking time series. In other words, these SOMs were not designed to find the underlying structures of sequences and correlation between variables, therefore, their objectives are different. A comparison of time-series clustering and sequence chunking is shown in Figure 1 . Word Embeddings. In the field of natural language processing, word embedding algorithms generally transforms texts and paragraphs into vector representations (Khattak et al., 2019; Bojanowski et al., 2017; Peters et al., 2018) . FastText enriched the word vector with subword information (Bojanowski et al., 2017 ), whereas ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) aimed to represent word by contextualized word embeddings. Chunking problems presented here are related to some



Figure 1: Explanation of the difference between time-series clustering and sequence chunking. (a) Process of time-series clustering. Homogenous time-series (S1-S3, S4-S5) are grouped together based on a certain similarity measure. (b) Chunking problems example. A fixed chunk (state variables A-B-C [blue]) and a probabilistic chunk (D-E-F [orange]) are repeated in the input sequence with equal probabilities. (c) Input-output map of problem structure in (b) over time. State transitioning by first-order Markov chain. (d) Examples of the structures of fixed chunk and probabilistic chunk.

