APPROXIMATE VANISHING IDEAL COMPUTATIONS AT SCALE

Abstract

The vanishing ideal of a set of points X = {x 1 , . . . , x m } ⊆ R n is the set of polynomials that evaluate to 0 over all points x ∈ X and admits an efficient representation by a finite subset of generators. In practice, to accommodate noise in the data, algorithms that construct generators of the approximate vanishing ideal are widely studied but their computational complexities remain expensive. In this paper, we scale up the oracle approximate vanishing ideal algorithm (OAVI), the only generator-constructing algorithm with known learning guarantees. We prove that the computational complexity of OAVI is not superlinear, as previously claimed, but linear in the number of samples m. In addition, we propose two modifications that accelerate OAVI's training time: Our analysis reveals that replacing the pairwise conditional gradients algorithm, one of the solvers used in OAVI, with the faster blended pairwise conditional gradients algorithm leads to an exponential speed-up in the number of features n. Finally, using a new inverse Hessian boosting approach, intermediate convex optimization problems can be solved almost instantly, improving OAVI's training time by multiple orders of magnitude in a variety of numerical experiments.

1. INTRODUCTION

High-quality features are essential for the success of machine-learning algorithms (Guyon & Elisseeff, 2003) and as a consequence, feature transformation and selection algorithms are an important area of research (Kusiak, 2001; Van Der Maaten et al., 2009; Abdi & Williams, 2010; Paul et al., 2021; Manikandan & Abirami, 2021; Carderera et al., 2021) . A recently popularized technique for extracting nonlinear features from data is the concept of the vanishing ideal (Heldt et al., 2009; Livni et al., 2013) , which lies at the intersection of machine learning and computer algebra. Unlike conventional machine learning, which relies on a manifold assumption, vanishing ideal computations are based on an algebraic set 1 assumption, for which powerful theoretical guarantees are known (Vidal et al., 2005; Livni et al., 2013; Globerson et al., 2017) . The core concept of vanishing ideal computations is that any data set X = {x 1 , . . . , x m } ⊆ R n can be described by its vanishing ideal, I X = {g ∈ P | g(x) = 0 for all x ∈ X}, where P is the polynomial ring over R in n variables. Despite I X containing infinitely many polynomials, there exists a finite number of generators of I X , g 1 , . . . , g k ∈ I X with k ∈ N, such that any polynomial h ∈ I X can be written as h = k i=1 g i h i , where h i ∈ P for all i ∈ {1, . . . , k} (Cox et al., 2013) . Thus, the generators share any sample x ∈ X as a common root, capture the nonlinear structure of the data, and succinctly represent the vanishing ideal I X . Due to noise in empirical data, we are interested in constructing generators of 1 A set X ⊆ R n is algebraic if it is the set of common roots of a finite set of polynomials. 1

