NEURAL-BASED CLASSIFICATION RULE LEARNING FOR SEQUENTIAL DATA

Abstract

Discovering interpretable patterns for classification of sequential data is of key importance for a variety of fields, ranging from genomics to fraud detection or more generally interpretable decision-making. In this paper, we propose a novel differentiable fully interpretable method to discover both local and global patterns (i.e. catching a relative or absolute temporal dependency) for rule-based binary classification. It consists of a convolutional binary neural network with an interpretable neural filter and a training strategy based on dynamically-enforced sparsity. We demonstrate the validity and usefulness of the approach on synthetic datasets and on an open-source peptides dataset. Key to this end-to-end differentiable method is that the expressive patterns used in the rules are learned alongside the rules themselves.

1. INTRODUCTION

During the last decades, machine learning and in particular neural networks have made tremendous progress on classification tasks for a variety of fields such as healthcare, fraud detection or entertainment. They are able to learn from various data types ranging from images to timeseries and achieve impressive classification accuracy. However, they are difficult or impossible to understand by a human. Recently, explaining those black-box models has attracted considerable research interest under the field of Explainable AI (XAI). However, as stated by Rudin (2019), those aposteriori approaches are not the solution for high stakes decision-making and more interest should be placed on learning models that are interpretable in the first place. Rule-based methods are interpretable, human-readable and have been widely adopted in different industrial fields with Business Rule Management Systems (BRMS). In practice however, those rules are manually written by experts. One of the reasons manually-written rule models cannot easily be replaced with learned rule models is that rule-base learning models are not able to learn as expressive rules with higher-level concepts and complex grammar (Kramer, 2020) . Moreover, due to the lack of latent representations, rule-based learning methods underperform w.r.t. state-of-the-art neural networks (Beck & Fürnkranz, 2021) . Classical classification rule learning algorithms (Cohen, 1995; Breiman et al., 1984; Dash et al., 2018; Lakkaraju et al., 2016; Su et al., 2016) as well as neural-based approaches to learn rules (Qiao et al., 2021; Kusters et al., 2022) 2014) but with a different goal in mind : improve the performance of extracted patterns for a fixed rule grammar as opposed to extending the rule grammar. Another domain of research focuses on training binary neural networks to obtain more computational efficient model storing, computation and evaluation efficiency (Geiger & Team, 2020; Helwegen et al., 2019) . It comes with fundamental optimization challenges around weights updates and gradient computation. In this paper, we bridge three domains and introduce a binary neural network to learn classification rules on sequential data. We propose a differentiable rule-based classification model for sequential data where the conditions are composed of sequence-dependent patterns that are discovered alongside the classification task itself. More precisely, we aim at learning a rule of the following structure: if pattern then class = 1 else class = 0. In particular we consider two types of patterns: local and global patterns as introduced in Aggarwal ( 2002) that are in practice studied independently with a local and a global model. A local pattern describes a subsequence at a specific position in the sequence while a global pattern is invariant to the location in the sequence (Fig 2 ). The network, that we refer to as Convolutional Rule Neural Network (CR2N), builds on top of a base rule model that is comparable to rule models for tabular data presented in Qiao et al. ( 2021); Kusters et al. (2022) . The contributions of this paper are the following: i) We propose a convolutional binary neural network that learns classification rules together with the sequence-dependent patterns in use. ii) We present a training strategy to train a binarized neural network while dynamically enforcing sparsity. iii) We show on synthetic and real world datasets the usefulness of our architecture with the importance of the rule grammar and the validity of our training process with the importance of sparsity. The code is publicly available at https://github.com/IBM/cr2n.

2. BASE RULE MODEL

The base rule model we invoke is composed of three consecutive layers (Fig 1 ). The two last layers respectively mimic logical AND and OR operators (Qiao et al., 2021; Kusters et al., 2022) . On top of these layers, we add an additional layer that is specific for categorical input data and corresponds to an OR operator for each categorical variable over every possible value it can take.  x 1 x 2 x 3 x 1 x 2 h 1 h 2 h 3 h 4 h 5 h 2 y StackedOR layer AND layer OR layer k n k , K (K, H) (H, 1) xc 1 = A1 . . . B1



(or logical expressions with Riegel et al. (2020)) do not provide the grammar required to learn classification rules on sequential data. Numerous approaches for learning classification rules on sequential data in the field of sequential pattern mining have been studied in the past such as Egho et al. (2015); Zhou et al. (2013); Holat et al. (

Figure 1: Example of a trained base rule model architecture for the rule if (B 1 and D 2 ) or (D 2 and C 3 ) then 1 else 0 on 3 categorical variables x c1 , x c2 and xc3 (x c k ∈ {A k , B k , C k , D k }).For simplicity, the truth value of x c1 = B 1 is replaced by B 1 for example. Plain (dotted) lines represent activated (masked) weights. An example evaluation of the model is represented with the filled neurons (neuron=1) for the binary input x c1 = B 1 , x c2 = D 2 and x c3 = A 3 .

