REINFORCEMENT LOGIC RULE LEARNING FOR TEM-PORAL POINT PROCESSES

Abstract

We aim to learn a set of temporal logic rules to explain the occurrence of temporal events. Leveraging the temporal point process modeling and learning framework, the rule content and rule weights are jointly learned by maximizing the likelihood of the observed noisy event sequences. The proposed algorithm alternates between a master problem, where the rule weights are updated, and a subproblem, where a new rule is searched and included. The formulated master problem is convex and relatively easy to solve, whereas the subproblem requires searching the huge combinatorial rule predicate and relationship space. To tackle this challenge, we propose a neural search policy to learn to generate the new rule content as a sequence of actions. The policy parameters will be trained end-to-end using the reinforcement learning framework, where the reward signals can be efficiently queried by evaluating the subproblem objective. The trained policy can be used to generate new rules, and moreover, the well-trained policies can be directly transferred to other tasks to speed up the rule searching procedure in the new task. We evaluate our methods on both synthetic and real-world datasets, obtaining promising results.

1. INTRODUCTION

Understanding the generating process of events with irregular timestamps has long been an interesting problem. Temporal point process (TPP) is an elegant probabilistic model for modeling these irregular events in continuous time. Instead of discretizing the time horizons and converting the event data into time-series event counts, TPP models directly model the inter-event times as random variables and can be used to predict the time-to-event as well as the future event types. Recent advances in neural-based temporal point process models have exhibited superior ability in event prediction (Du et al., 2016; Mei & Eisner, 2017) . However, the lack of interpretability of these black-box models hinders their applications in high-stakes systems like healthcare. In healthcare, it is desirable to summarize medical knowledge or clinical experiences about the disease phenotypes and therapies to a collection of logic rules. The discovered rules can contribute to the sharing of clinical experiences and aid to the improvement of the treatment strategy. They can also provide explanations to the occurrence of events. For example, the following clinical report "A 50 years old patient, with a chronic lung disease since 5 years ago, took the booster vaccine shot on March 1st. The patient got exposed to the COVID-19 virus around May 12th, and afterward within a week began to have a mild cough and nasal congestion. The patient received treatment as soon as the symptoms appeared. After intravenous infusions at a healthcare facility for around 3 consecutive days, the patient recovered... " contains many clinical events with timestamps recorded. It sounds appealing to distill compact and human-readable temporal logic rules from these noisy event data. In this paper, we propose an efficient reinforcement temporal logic rule learning algorithm to automatically learn these rules from event sequences. See Fig. 1 for a better illustration of the types of temporal logic rules we aim to discover, where the logic rules are in disjunctive normal form (i.e., OR-of-ANDs) with temporal ordering constraints. Our proposed reinforcement rule learning algorithm builds upon the temporal logic point process (TLPP) models (Li et al., 2020) , where the intensity functions (i.e., occurrence rate) of events are informed by temporal logic rules. TLPP is intrinsically a probabilistic model that treats the temporal logic rules as soft constraints. The learned model can tolerate the uncertainty and noisiness in events and can be directly used for future event prediction and explaination. Given this TLPP modeling framework, our reinforcement rule learning algorithm jointly learns the rule content (i.e., model structures) and rule weights (i.e., model parameters) by maximizing the likelihood of the observed events. The designed learning algorithm alternates between solving a convex master problem, where the continuous rule weight parameters are easily optimized, and solving a more challenging subproblem, where a new candidate rule that has the potential to most improve the current likelihood is discovered via reinforcement learning. New rules are progressively discovered and included until by adding new rules the objective will not be improved. 𝐱 𝟑 𝐱 𝟐 𝐘 𝐱 𝟒 𝐱 𝟓 Rule-1(𝑓 ! ) Rule-2(𝑓 " ) Before None Equal 𝐱 𝟐 𝐱 𝟔 𝐱 𝟑 𝐱 𝟏 𝑓 ! 𝑓 " Figure 1: Example of temporal logic rules: f1 : Y ← x1 ∧ x2 ∧ x3 ∧ x4 ∧ (x1 Bef ore x2) ∧ (x2 N one x3) ∧ (x3 Bef ore x4), f2 : Y ← x5 ∧ x2 ∧ x3 ∧ x6 ∧ (x5 N one x2) ∧ (x2 Equal x3) ∧ (x3 Bef ore x6). N one means no temporal order constraints. Specifically, we formulate the rule discovery subproblem as a reinforcement learning problem, where a neural policy is learned to efficiently navigate through the combinatorial search space to search for a good explanatory temporal logic rule to add. The constructed neural policy emits a distribution over the prespecified logic predicate and temporal relation libraries, and generates the logic variables as actions in a sequential way to form the rule content. The generated rules can be of various lengths. Once a temporal logic rule is generated, a terminal reward signal can be efficiently queried by evaluating the current subproblem objective using the generated rule, which is computationally expedient, without the need to worry about the insufficient reward samples. The neural policy is gradually improved by a risk-seeking policy gradient to learn to generate rules to optimize the subproblem objective, which is rigorously formulated from the dual variables of the master problem so as to search for a rule that has the potential to best improve the current likelihood. This proposed reinforcement logic rule learning algorithm has the following advantages: 1) We utilize differentiable policy gradient to solve the temporal logic rule search subproblem. All the policy parameters can be learned end-to-end via policy gradient using the subproblem objective as reward. 2) Domain knowledge or grammar constraints for the temporal logic rules can be easily incorporated by applying specific dynamic masks to the rule generative process at each time step. 3) The memories of how to search through the rule space have been encoded in the policy parameters. The well-trained neural policies for each subproblem can be directly transferred to similar rule learning tasks to speed up the computation in new tasks, where we don't need to learn rules from scratch. Contributions Our main contributions have the following aspects: i) We propose an efficient and differentiable reinforcement temporal logic rule learning algorithm, which can automatically discover temporal logic rules to predict and explain events. Our method will add flexibility and explainability to the temporal point process models and broaden their applications in scenarios where interpretability is important. ii) All the well-trained neural policies in solving each subproblem can be readily transferred to new tasks. This fits the continual learning concept well. The quality of the rule search policies can be continually improved across various tasks. For a new task, we can utilize the preceding tasks' memories even though we cannot get access to the old training data. We empirically evaluated the transferability of our neural policies and achieved promising results. iii) Our discovered temporal logic rules are human-readable. Their scientific accuracy can be easily judged by human experts. The discovered rules may also trigger experts in thinking. In our paper, we considered a real healthcare dataset and mined temporal logic rules from these clinical event data. We invited doctors to verify these rules and incorporated their feedback and modification into our experiments.

2. RELATED WORK

Temporal point process (TPP) models. TPP models can be characterized by the intensity function. The modeling framework boils down to the design of various intensity functions to add the model flexibility and interpretability (Mohler et al., 2011) . Recent development in deep learning has

