RECURRENT REAL-VALUED NEURAL AUTOREGRES-SIVE DENSITY ESTIMATOR FOR ONLINE DENSITY ES-TIMATION AND CLASSIFICATION OF STREAMING DATA

Abstract

In contrast with the traditional offline learning, where complete data accessibility is assumed, many modern applications involve processing data in a streaming fashion. This online learning setting raises various challenges, including concept drift, hardware memory constraints, etc. In this paper, we propose the Recurrent Real-valued Neural Autoregressive Density Estimator (RRNADE), a flexible density-based model for online classification and density estimation. RRNADE combines a neural Gaussian mixture density module with a recurrent module. This combination allows RRNADE to exploit possible sequential correlations in the streaming task, which are often ignored in the classical streaming setting where each input is assumed to be independent from the previous ones. We showcase the ability of RRNADE to adapt to concept drifts on synthetic density estimation tasks. We also apply RRNADE to online classification tasks on both real world and synthetic datasets and compare it with multiple density based as well as nondensity based online classification methods. In almost all of these tasks, RRNADE outperforms the other methods. Lastly, we conduct an ablation study demonstrating the complementary benefits of the density and the recurrent modules.

1. INTRODUCTION

Many tasks in classic supervised machine learning, such as regression and classification, involve processing batched data in an offline fashion: the data, often coming as input-output pairs, is stored first and then used to learn a predictive model for future unseen data. However, many modern applications favor the form where the model update and predict while receiving new data entries. This form is often referred to as learning from data streams. he problem of learning from data streams is closely related to the problem of continual or incremental learning (Losing et al., 2018; Zenke et al., 2017; Lopez-Paz & Ranzato, 2017) which have recently received an increasing interest in the machine learning community There are three major issues when learning from data streams: memory constrains, concept drifts as well as temporal correlations. The sheer amount of data many modern applications process daily makes it infeasible to store all data and perform offline update of the model (Naeem et al., 2022) . In addition, certain data sources do not allow the indefinite hold of the data due to potential privacy regulations (Forti, 2021) . Therefore, when learning data streams, it is often assumed that the model only has access to the recent history. Furthermore, concept drifts and temporal correlations are also common challenges when learning from data streams. Under the offline setting, data is often assumed to have the i.i.d. assumption, i.e. each data entry is independently drawn from the identical distribution. However, under the streaming data setting, the independent assumption can be violated, causing temporal correlations in the data, while the violation of the identical assumption can lead to concept drifts problem. These issues often invalidate the model learned from historical data, resulting in further deterioration of its performance. Density estimation is one of the core tasks in the field of unsupervised learning, branching out to many applications such as classification and clustering. Under the offline setting, Real-valued Neural Autoregressive Density Estimator (RNADE) leverages a neural network parameterized Gaussian mixture model to estimate the density function of real-valued vectors. It is then curious if extending RNADE to its online form would be possible, namely, the model needs to be updated as new data arrives and we only have a limited amount of history stored in memory. In this paper, we show that the answer is in the positive. Concretely, our contributions are as follows: 1. We propose the Recurrent Real-valued Neural Autoregressive Density Estimator (RRNADE), a versatile density estimator for online learning of data streams. 2. Moreover, we propose an RRNADE based Bayes classifier for online classification of streaming data. Our model uses a recurrent module to maintain a set of sufficient statistics for the future and capture the potential temporal properties of the data. In addition, it also uses a neural networks parameterized Gaussian mixture model as the density module to compute the conditional density function of the current input given the previous data. We theoretically show that RRNADE is strictly more expressive than Gaussian hidden Markov models Bilmes et al. (1998) . We present empirical results demonstrating the ability of RRNADE to adapt to concept drifts and approximating density functions with sequential relations. Moreover, we conduct extensive experiments on various benchmarks of online classification and show that RRNADE outperforms all the compared methods on almost every dataset. In addition, we further demonstrate the importance of both the recurrent module and the density module in the ablation study. Related Works For online density estimation on streaming data, many of the existing works focus on the adoption of the kernel density estimation (KDE) method (Procopiuc & Procopiuc, 2005; Heinz & Seeger, 2008; Kristan et al., 2011; Boedihardjo et al., 2008) . These estimators often relies on maintaining and updating (though merging) a specific number of kernels while incorporating new instances, while in different fashions. In addition to these methods, KDE-Track (Qahtan et al., 2016) leverages an adaptive resampling strategy to deal with concept drifts and improve the estimation accuracy of the KDE-based methods. Another recent method, adaptive local online kernel density estimator (ALoKDE) (Chen et al., 2021), leverages a statistical test for concept drift detection to adapt fast to the concept drift. All these methods can be modified to a classification method via a Bayes classifier. For online classification on streaming data, there are a number of methods that are direct adaptations of the original offline version to its online case. For example, the online SVM (OSVM) (Li & Yu, 2015) , the adaptive random forest (ARF) (Gomes et al., 2017a) , can be categorised to this type of methods. In addition, (Liang et al., 2006; Cauwenberghs & Poggio, 2000; Lu et al., 2014 ) also belong to this class of methods. Other methods like (Bifet & Gavalda, 2007; Bifet et al., 2013) leverage an adaptive window size of the past, (Losing et al., 2016) takes advantage of the short-term and long-term memories, while (Gomes et al., 2017b; Polikar et al., 2001) 

2. BACKGROUND

In this section we will background knowledge including the real-valued neural autoregressive density estimator (RNADE), recurrent models. We will also introduce the formulation of the online density estimation and classification tasks. , where x <i denotes all attributes preceding x i ∈ R in a fixed ordering * , p M is a mixture of m Gaussians with parameters θ i = {β i ∈ R m , µ i ∈ R m , σ i ∈ R m }. Moreover, we have:



* Later we will also use the notationx [a,b] , where a < b ∈ N, to denote xa+1, • • • , x b ∈ R d



use ensemble method to further improve the results. Another large class of online classification method is the prototypebased classifiers, such as incremental learning vector quantization (ILVQ)(Losing et al., 2015), generalized LVQ(Sato & Yamada, 1995), robust soft LVQ(Heusinger et al., 2019), and the sparse prototype online kernel density estimator (SPOK)(Coelho & Barreto, 2022).

-valued neural autoregressive density estimator (RNADE) The real-valued neural autoregressive density estimator (RNADE) (Uria et al., 2013) is a generalization of the original neural autoregressive density estimator (NADE) (Uria et al., 2016) to continuous variables. The core idea of RNADE is to estimate the joint density using the chain rule and approximate each conditional density via neural networks, i.e. p(x 1 , • • • , x n ) =

