QUICKEST CHANGE DETECTION FOR MULTI-TASK PROBLEMS UNDER UNKNOWN PARAMETERS

Abstract

We consider the quickest change detection problem where both the parameters of pre-and post-change distributions are unknown, which prevent the use of classical simple hypothesis testing. Without additional assumptions, optimal solutions are not tractable as they rely on some minimax and robust variant of the objective. As a consequence, change points might be detected too late for practical applications (in economics, health care or maintenance for instance). Other approaches solve a relaxed version of the problem through the use of particular probability distributions or the use of domain knowledge. We tackle this problem in the more complex Markovian case and we provide a new scalable approximate algorithm with near optimal performance that runs in O(1).

1. INTRODUCTION

Quickest Change Detection (QCD) problems arise naturally in settings where a latent state controls observable signals (Basseville et al., 1993) . In biology, it is applied in genomic sequencing (Caron et al., 2012) and in reliable healthcare monitoring (Salem et al., 2014) . In industry, it finds application in faulty machinery detection (Lu et al., 2017; Martí et al., 2015) and in leak surveillance (Wang et al., 2014) . It also has environmental applications such as traffic-related pollutants detection (Carslaw et al., 2006) . Any autonomous agent designed to interact with the world and achieve multiple goals must be able to detect relevant changes in the signals it is sensing in order to adapt its behaviour accordingly. This is particularly true for reinforcement learning based agents in multi-task settings as their policy is conditioned to some task parameter (Gupta et al., 2018; Teh et al., 2017) . In order for the agent to be truly autonomous, it needs to identify the task at hand according to the environment requirement. For example, a robot built to assist cooks in the kitchen should be able to recognise the task being executed (chopping vegetables, cutting meat, ..) without external help to assist them efficiently. Otherwise, the agent requires a higher intelligence (one of the cooks for instance) to control it (by stating the task to be executed). In the general case, the current task is unknown and has to be identified sequentially from external sensory signals. The agent must track the changes as quickly as possible to adapt to its environment. However, current solutions for the QCD problem when task parameters are unknown, either do not scale or impose restrictive conditions on the setting. (i.i.d. observations, exponential family distributions, partial knowledge of the parameters, etc.). In this paper, we construct a scalable algorithm with similar performances to optimal solutions. For this purpose, we use the change detection delay under known parameters as a lower bound for the delay in the unknown case. This improves our estimations of the parameters and thus improves our change point detection. We consider the case where the data is generated by some Markovian processes as in reinforcement learning. We assess our algorithm performances on synthetic data generated using distributions parameterised with neural networks in order to match the complexity level of real life applications. We also evaluate our algorithm on standard reinforcement learning environment.

2. QUICKEST CHANGE DETECTION PROBLEMS

Formally, consider a sequence of random observations (X t ) where each X t belongs to some observation space X (say, an Euclidean space for simplicity) and is drawn from f θt (.|X t-1 ), where the parameter θ t belongs to some task parameter space Θ and {f θ , θ ∈ Θ} is a parametric probability distribution family (non trivial, in the sense that all f θ are different). The main idea is that, at almost all stages, θ t+1 = θ t but there are some "change points" where those two parameters differ. Let us denote by t k the different change points and, with a slight abuse of notations, by θ k the different values of the parameters. Formally, the data generating process is therefore: X t ∼ K k=0 f θ k (.|X t-1 )1 t k ≤.<t k+1 (t). The overarching objective is to identify as quickly as possible the change points t k and the associated parameters θ k , based on the observations (X t ). Typical procedures propose to tackle iteratively the simple change point detection problem, and for this reason we will focus mainly on the simpler setting of a single change point, where K = 2, t 0 = 0, t 1 = λ and t 2 = ∞, where λ is unknown and must be estimated. For a formal description of the model and the different metrics, we will also assume that the parameters (θ 0 , θ 1 ) are drawn from some distribution F over Θ. As a consequence, the data generating process we consider boils down to the following system 1: θ 0 , θ 1 ∼ F and X t+1 ∼ f θ0 (.|X t ) if t ≤ λ X t+1 ∼ f θ1 (.|X t ) if t > λ . (1)

2.1. CRITERIA DEFINITIONS

As mentioned, the objective is to detect change points as quickly as possible while controlling the errors. There exist different metrics to evaluate algorithm performances; they basically all minimise some delay measurements while keeping the rate of type I errors (false positive change detection) under a certain level. We will describe later on different existing definitions of these criteria (type I error and delay). In order to evaluate a probability of error and an expected delay, we obviously need to define relevant probability measures first. Traditionally, there are two antagonistic ways to construct them: the MIN-MAX and the BAYESIAN settings (Veeravalli & Banerjee, 2014). First, we denote by P n (resp. E n ) the data probability distribution (resp. the expectation) conditioned to the change point happening at λ = n. This last event happens with some probability µ(λ = n) -with the notation that bold characters designate random variables-, and we denote by P µ (resp. E µ ) the data probability distribution (resp. the expectation), integrated over µ, i.e., for any event Ω, it holds that P µ (Ω) = n µ(λ = n)P n (Ω). In the following, we describe the major existing formulations of the QCD problem where the goal is to identify an optimal stopping time τ : BAYESIAN formulation: In this formulation, the error is the Probability of False Alarms (PFA) with respect to P µ . The delay is evaluated as the Average Detection Delay (ADD) with respect to E µ . PFA(τ ) = P µ (τ < λ) and ADD(τ ) = E µ [(τ -λ)|τ > λ] In this setting, the goal is to minimise ADD while keeping PFA below a certain level α (as in Shiryaev formulation (Shiryaev, 1963) ). Formally, this rewrites into: (SHIRYAEV) ν α = arg min τ ∈∆α ADD(τ ) ∆ α = {τ : PFA(τ ) < α} Min-Max formulation: The MIN-MAX formulation disregards prior distribution over the change point. As a consequence, the error is measured as the False Alarm Rate (FAR) with respect to the worst case scenario where no change occurs (P ∞ ). As for the delay, two possibilities are studied: the Worst Average Detection Delay (WADD) and the Conditional Average Detection Delay (CADD). WADD evaluates this delay with respect to the worst scenario in terms of both the change point and the observations. CADD is a less pessimistic evaluation as it only considers the worst scenario in terms of change point. Mathematically they are defined as: FAR(τ ) = 1 E∞[τ ] and WADD(τ ) = sup n esssup X n E n [(τ -n) + |X n ] CADD(τ ) = sup n E n [(τ -n)|τ > λ] where X n designates all observation up to the n th one. In this setting, the goal is to minimise either WADD (Lorden formulation (Lorden et al., 1971 )); or CADD (Pollak formulation (Pollak, 1985) ) while keeping FAR below a certain level α. Formally, these problems are written as follows: 



(LORDEN)    ν α = arg min τ ∈∆α WADD(τ ) ∆ α = {τ : FAR(τ ) < α} and (POLLAK) ν α = arg min τ ∈∆α CADD(τ ) ∆ α = {τ : FAR(τ ) < α}(3)

