LEARNING INTERPRETABLE NEURAL DISCRETE REP-RESENTATION FOR TIME SERIES CLASSIFICATION

Abstract

Time series classification is a challenging research field with many real-life applications. Recent advances in deep learning have significantly improved the state of the art: recurrent or convolutional architectures allow automatic extraction of complex discriminating patterns that improve performance. Those approaches suffer from a lack of interpretability: the patterns are mapped into a high dimensional latent vector space, they are not representable in the time domain, and are often even not localizable. In this paper, we present a novel neural convolutional architecture that aims to provide a trade-off between interpretability and effectiveness based on the learning of a dictionary of discrete representations. The proposed model guarantees (1) that a small number of patterns are learned, and they are visualizable and interpretable (2) a shift equivariance property of the model associated with a time-consistency of the representation (3) a linear classifier over a limited number of patterns leading to an explainable decision. To ensure the robustness of the discrete representation, they are learned in an unsupervised process independently of the classification task. This allows further great performances in transfer learning. We present extensive experiments on the UCR benchmark wrt usual baselines. The interpretability of the model is illustrated empirically. The chosen trade-off results obviously in a decrease in performance compared to the state of the art. The performance drop is however limited and very dependent on the application domain. The experiments highlight the efficiency of the model for the transfer learning task, showing the robustness of the representations.

1. INTRODUCTION

Over the recent years, Deep Neural Networks (DNNs) have become efficient for Time Series Classification (TSC) tasks (Fawaz et al., 2019; Tang et al., 2021) . These models are highly capable of extracting complex discriminative features at different frequencies. In the meantime, representation learning approaches demonstrate good performances for clustering (Ma et al., 2019) , few-shot learning classification (Franceschi et al., 2019; Malhotra et al., 2017) , missing values imputation or forecasting (Zerveas et al., 2022) . However, they lack interpretability. Indeed mapping signals into an abstract continuous space of high dimension in which weights have no meaning prevents any justification of the decision resulting from these architectures. In their review of representationbased learning, Bengio et al. (2013) emphasize the ability to extract Explanatory Factors and retain Temporal Coherence to build good representations. Current approaches do not meet these criteria. Moreover, most of the time, these opaque models are highly specialized in a single task and cannot be transferred to other problems. In this paper, we propose to discuss the fundamental properties required to build a transparent, explainable, and potentially transferable time series classification process between tasks. First, concerning temporal aspects, we assume that important information has limited support. Therefore, we try to preserve both the shift equivariance of the extracted patterns and the temporal consistency of the latent representation (Cohen & Welling, 2016; Bengio et al., 2013) . Second, to achieve a transparent model interpretable by an expert, we impose that the extracted patterns belong to a dictionary of limited size and that their pre-images in the time domain can be computed (Araujo et al., 2019) . Third, we restrict ourselves to a strongly regularized linear classifier to explain the decision at the model-level for each task (in addition to an instance-wise interpretation) (James et al., 2013) . Finally, we want a model that is ready for multi-tasking and transfer, i.e. a model where the representation is learned in an unsupervised way, independently from the decision module. We present a new convolutional architecture relying on neural discrete representation inspired by Van Den Oord et al. (2017) that meets these criteria. Unsupervised representation learning is based on a discrete hierarchical autoencoder. The discretization is a simple vector quantization, which efficiently removes noise in the manner of matrix factorization approaches (Gray, 1984) . The hierarchical aspect represents the best compromise to keep a good expressiveness with atoms modeling phenomena at different frequencies while limiting the size of the pattern dictionary (Razavi et al., 2019) . For the classification part, a logistic regression is used on the n-grams formed by the successive detections of the dictionary patterns. The model is penalized by an elasticnet regularization (Zou & Hastie, 2005) . We demonstrate the competitiveness of this processing chain on UCR data and compare the results to the state of the art while providing qualitative interpretations of the decisions (Dau et al., 2019) . We propose an analysis of the origin of performance losses in some classes of applications to better identify the strengths and weaknesses of the different properties related to the interpretability. Our main contributions can be summarized as follows: • Formalization of the properties required for a more transparent signal classification architecture. • A model derived from Van Den Oord et al. (2017) taking into account the previous constraints for more interpretability and allowing transfer between tasks. • A series of experiments demonstrating both the efficiency of the implemented architecture and the qualitative and interpretative interest of the previous constraints.

GENERALIZATION

In the Time-Series Classification (TSC) field, interpretability can be instance-wise or/and classwise. Instance-wise interpretability allows weighting the responsible areas of the instance for the



Figure 1: Overview of the whole process: unsupervised representation learning (1+2+3) and logistic regression on extracted features (4). One should notice that the training of the architecture (1+2+3) is independent of the classification task. For more details on encoder's and decoder's structure see Appendix A.1.

