SEQUENCE METRIC LEARNING AS SYNCHRONIZATION OF RECURRENT NEURAL NETWORKS

Abstract

Sequence metric learning is becoming a widely adopted approach for various applications dealing with sequential multi-variate data such as activity recognition or natural language processing and is most of the time tackled with sequence alignment approaches or representation learning. In this paper, we propose to study this subject from the point of view of dynamical system theory by drawing the analogy between synchronized trajectories produced by dynamical systems and the distance between similar sequences processed by a siamese recurrent neural network. Indeed, a siamese recurrent network comprises two identical sub-networks, two identical dynamical systems which can theoretically achieve complete synchronization if a coupling is introduced between them. We therefore propose a new neural network model that implements this coupling with a new gate integrated into the classical Gated Recurrent Unit architecture. This model is thus able to simultaneously learn a similarity metric and the synchronization of unaligned multi-variate sequences in a weakly supervised way. Our experiments show that introducing such a coupling improves the performance of the siamese Gated Recurrent Unit architecture on an activity recognition dataset.

1. INTRODUCTION

Metric learning aims at learning an essential component for numerous machine learning algorithms used for classification or clustering: a similarity. It has the benefit to be usable in weakly supervised settings where only equivalence constraints between samples are known (Xing et al. (2003) ), which allows for a large number of applications on various data types: from person re-identification (Yang et al. (2018) ), object tracking (Bertinetto et al. (2016) ) and gesture recognition (Berlemont et al. (2018) ) to sentence similarity computation (Mueller & Thyagarajan (2016) ). Among those applications, less attention has been given to design specific sequence metric learning algorithms, specifically with neural networks despite the simplicity of the siamese architecture (Bromley et al. (1994) ). One easy way to adapt existing approaches to sequential data is to learn representations through Sequence-to-Sequence models (Sutskever et al. ( 2014)) or Transformers (Vaswani et al. (2017) ). However, these models would be difficult to learn in a weakly supervised way for providing a similarity metric and further lose temporal dependency information inside the sequence and alignment information between sequences. On the contrary, Dynamic Time Warping (DTW) (Sakoe & Chiba (1978) ) is a classical approach to measure distance between sequences and relies on aligning sequences. Its integration inside learning algorithms has been rendered difficult by its non-differentiability and its theoretical quadratic time complexity which badly suits the equivalence constraint framework and some associated more complex losses (Oh Song et al., 2016; Sohn, 2016; Yang et al., 2018) . Recent works mitigate these drawbacks notably with virtual metric learning (Perrot & Habrard, 2015; Su & Wu, 2019) and soft versions of DTW (Cai et al., 2019; Abid & Zou, 2018) . Therefore, we aim at designing a neural network architecture specifically adapted to sequence metric learning. Recurrent neural networks (RNN) have a temporal dynamic behavior which allows to study them as dynamical systems. We propose in this paper a new framework for sequence metric learning based on dynamical system synchronization theory. We propose to replace the concept of metric in a vector space by the concept of synchronization of trajectories in a state space. Instead of computing distances on input representations, we propose to measure how two dynamical systems, and precisely two RNNs, respond to input pairs in term of synchronization. The notion of coupling is crucial when trying to synchronize dynamical systems. We introduce a coupled version of the Gated Recurrent Unit (GRU) (Cho et al. ( 2014)) to implement coupling inside a siamese architecture. Our experimental evaluation shows that this modification provides an improvement over a classical Siamese GRU implementation. The paper is organized as follows: Section 2 outlines the state-of-the-art approaches in sequence metric learning, Section 3 describes our framework and our new siamese architecture, Section 4 shows our experimental results to assess the performances of our approach compared to the state of the art, and Section 5 presents our conclusions and perspectives.

2. RELATED WORK

Recurrent neural networks and dynamical system theory. A main property of RNN is to exhibit a dynamic behavior which enables them to learn temporal sequence correlations. This behavior can therefore be studied using dynamical system theory: an important result being that RNN can approximate any finite-time trajectory of a dynamical system (Funahashi & Nakamura (1993) ). Other early works analyzed the RNN convergence stability (Hirsch (1989)) and helped to understand the problem of long term dependencies (Bengio et al. (1994) ). Laurent & von Brecht (2016) studied the dynamics of Long-Short Term Memory (LSTM) Neural Networks and GRU and observed that it is chaotic in the absence of input data. They designed a Chaos-Free RNN architecture having a more predictable behavior. In another recent publication, Chang et al. ( 2019) studied RNN trainability and established a connection with discretized Ordinary Differential Equations stability. They identified a criterion to guarantee that the system can preserve long-term dependencies and proposed a new version of RNN based on those observations. Both papers demonstrate that dynamical system theory is a fertile soil to study and conceive new RNN models. Finally, we would like to mention works on the definition of metrics to compare non-linear dynamical systems (Martin (2000) ; Ishikawa et al. ( 2018)) although our objective is not exactly the same, as we propose to use dynamical system synchronization theory to improve metric learning on any type of sequential data, whereas these methods have been conceived to work more specifically with structural data. Sequence metric learning. DTW is a classical approach for measuring distances between sequences (Sakoe & Chiba (1978) ). Numerous improvements have been brought to the original formulation notably to improve the k-nearest neighbor performance (Xi et al. (2006) ). Abid & Zou (2018) proposed to learn the DTW parameters that allow to reproduce the Euclidean distances between sequence representations learned with a Sequence-to-Sequence model (Sutskever et al. (2014) ). In contrast, Su & Hua (2017) proposed an alternative to DTW, the Order-Preserving Wasserstein (OPW) distance, by viewing the problem of metric learning between sequences as an optimal transport problem regularized to preserve the temporal relationships between the samples, and they solved it with the matrix scaling algorithm. Later (Su & Wu (2019)), the authors reformulated the DTW and OPW distances as parameterized meta-metrics of a single ground metric and proposed an optimization process to learn the metric and the latent alignment with virtual metric learning (Perrot & Habrard (2015) ), which reduces the number of constraints. Not only this approach speeds up training but it also outperforms several other metric learning approaches, notably approaches conceived for points generalized to sequences. In comparison, we propose a pure RNN approach similar to Mueller & Thyagarajan (2016) who presented a siamese neural network approach to learn sentence similarities as a l 1 -norm. In their method, the LSTM network combines the embeddings of the words of the sentence to learn a distance between representations of sentences. Finally, Varior et al. (2016) proposed a siamese convolutional architecture for person re-identification from video data with gates linking parallel layers allowing to accentuate common patterns between both representations. This leads to representations that are more suited to distinguish some pairs of similar or dissimilar images. In this paper, we introduce an alternative to DTW: a pure neural network approach to sequence metric learning based on the siamese RNN architecture. We propose to enhance the classical Siamese RNN by studying this model from dynamical system point of view, as it has been already done for standard RNN. The disadvantage of DTW-based approaches compared to ours is that it can be

