CORESET FOR RATIONAL FUNCTIONS

Abstract

We consider the problem of fitting a rational function g : R → R to a time-series f : {1, • • • , n} → R. This is by minimizing the sum of distances (loss function) ℓ(g) := n i=1 |f (i) -g(i)|, possibly with additional constraints and regularization terms that may depend on g. Our main motivation is to approximate such a time-series by a recursive sequence model G n = k i=1 θ i G n-i , e.g. a Fibonacci sequence, where θ ∈ R k are the model parameters, and k ≥ 1 is constant. For ε ∈ (0, 1), an ε-coreset for this problem is a data structure that approximates ℓ(g) up to 1 ± ε multiplicative factor, for every rational function g of constant degree. We suggest a coreset construction that runs in O(n 1+o(1) ) time and returns such a coreset that uses O(n o(1) /ε 2 ) memory words. We provide open source code as well as extensive experimental results, on both real and synthetic datasets, which compare our method to existing solvers from Numpy and Scipy.

1. BACKGROUND

The original motivation for this work was to suggest provable and efficient approximation algorithms for fitting input data by a stochastic model or its variants, such as Hidden-Markov Models (HMM) Basu et al. (2001) ; McCallum et al. (2000) ; Murphy (2002) 

1.1. AUTO-REGRESSION

Unfortunately, most existing results seem to be based on heuristics with little provable approximation guarantees. We thus investigate a simplified but fundamental version, called auto-regression, which has a provable but not so efficient solution using polynomial system solvers, after applying the technique of generating functions. This technique is strongly related to the Fourier, Laplace and z-Transform, as explained below. We define an auto-regression inspired by Ghosh et al. (2013); Eshragh et al. (2019) ; Yuan (2009), as follows: Definition 1. A time-series F : [n] → R is an auto-regression (AR for short) of degree k, if there exist a vector of coefficients θ = (θ 1 , • • • , θ k ) ∈ R k such that F (t) = θ 1 F (t-1)+• • •+θ k F (t-k). The polynomial P (x) = x k -θ k x k-1 -• • • -θ 1 is called the characteristic polynomial of F . Substituting k = 2, θ = (1, 1) and F (1) = F (2) = 1 in Definition 1 yields the Fibonacci sequence, i.e. F (t) = F (t -1) + F (t -2), where F (1) = F (2) = 1. From Auto-regression to Rational functions. In the corresponding "data science version" of Fibonacci's sequence, the input is the time-series G(1), G(2), • • • , which is based on F with additional noise. A straight forward method to recover the original model is by directly minimizing of the squared error between the given noisy time-series and the fitted values, as done e.g. in Eshragh et al. ( 2019)) using simple linear regression. However, this has a major drawback; AR time-series usually grows exponentially, like geometric sequences, and thus the loss will be dominated by the last few terms in the time-series. Moreover, small changes in the time domain have exponential To solve these issues, the fitting is done for the corresponding generation functions as follows. Proposition 1 (generative function Yuan ( 2009)). Consider an AR time-series F and its characteristic polynomial P (x) of degree k. Let Q(x) = x k P ( 1 x ) be the polynomial whose coefficients are the coefficients of P in reverse order. Then, there is a polynomial R(x) of degree less than k such that the generative function of F is f (x) := ∞ i=1 F (i)x i-1 = R(x) Q(x) , for every x ∈ R. Inspired by the motivation above, we define the following loss function for the AR recovery problem Problem 1 (RFF). Given a time-series g : [n] → R and an integer k ≥ 1, find a rational function f : [n] → R whose numerator and denominator are polynomials of degree at most k that minimizes n x=1 |f (x) -g(x)|. Note that the loss above is for fitting samples from the generative function of a noisy AR as done in Section 3.1. While we will focus on sum of errors (distances), we expect easy generalization to squared-distances, robust M-estimators and any other loss function that satisfies the triangle inequality, up to a constant factor, as in other coreset constructions Feldman (2020).

1.2. CORESETS

Informally, an input signal P consists of 2-dimensional points, a set Q of models, an approximation error ϵ ∈ (0, 1), and a loss function ℓ, a coreset C is a data structure that approximates the loss ℓ(P, q) for every model q ∈ Q, up to a multiplicative factor of 1 ± ϵ, in time that depends only on |C|. Hence, ideally, C is also much smaller than the original input P . Coreset for rational functions. Unfortunately, similarly to Rosman et al. ( 2014), the RFF problem with general input has no coreset which is weighed subset of the input; see Claim 1. This was also the case e.g., in Jubran et al. (2021) . Hence, we solve this problem similarly to Rosman et al. ( 2014), which requires us to assume that our input signal's first coordinate is simply a set of n consecutive integers, rather than a general set of reals. Even under this assumption there is no coreset which is a weighed subset of the input, or even a small weighed set of points; see Claim 1. We solve this problem by constructing a "representational" coreset that allows efficient storage and evaluation but dues not immediately yield an efficient solution to the problem as more commonly Feldman (2020). For more explanation on the components of this coreset see Section 1.4. Why such coreset? A trivial use of such coreset is data compression for efficient transmission and storage. While there are many properties of coresets as mentioned at Feldman (2020), some of them are non-immediate from our coreset; see Feldman (2020) for a general overview that was skipped due to space limitations. Nonetheless, since optimization over the coreset reduces the number of parameter, we hope that in the future there would be an efficient guaranteed solution (or approximation) over the coreset. Moreover, since this coreset does support efficient evaluation, we hope this coreset would yield an improvement for heuristics by utilizing this fast evaluation. 



; Park et al. (2012); Sassi et al. (2020); Yu et al. (2010) and reference therein, Baysian Networks Acar et al. (2007); Murphy (2002); Nikolova et al. (2010); Rudzicz (2010) and reference therein, auto-regression Ghosh et al. (2013), and Decision Markov Process Shanahan & den Poel (2010). Informally, and in the context of this work, a model defines a time series (sequence, discrete signal, time series) F : [n] → R where [n] := {1, • • • , n}, and the value F (t) at time (integer) t ≥ 1 is a function of only the previous (constant) k ≥ 1 past values F (t -1), • • • , F (t -k) in the sequence, and the model's parameter θ.

affect over time, it makes more sense to assume added noise in the frequency or generative function domain. Intuitively, noise in analog signals such as audio/video signals from an analog radio/tv is added in the frequency domain, such as aliasing of channels Nyquist (1928);Karras et al. (2021);  Shani & Brafman (2004), and not in the time domain, such as the volume.

While polynomials are usually simple and easy to handle, they do not suffice to accurately approximate non-smooth or non-Lipschitz functions; in such cases, high order polynomials are required, which leads to severe oscillations and numerical instabilitiesPeiris et al.  (2021). To overcome this problem, one might try to utilize piecewise polynomials or polynomial splines Northrop (2016); Nürnberger (1989); Sukhorukova (2010). However, this results in a very complex optimization problemMeinardus et al. (1989); Sukhorukova & Ugon (2017).Rational function approximation.A more promising direction would be to utilize rational functions for approximating the input signals, an example for this is Runge's phenomenonEpperson  (1987)  that resonates with Figure4in the appendix. Rational function approximation is a straight forward extension for polynomial approximations Trefethen (2019), yet are much more expressive

