FIRST-ORDER OPTIMIZATION ALGORITHMS VIA DIS-CRETIZATION OF FINITE-TIME CONVERGENT FLOWS

Abstract

In this paper the performance of several discretization algorithms for two first-order finite-time optimization flows. These flows are, namely, the rescaled-gradient flow (RGF) and the signed-gradient flow (SGF), and consist of non-Lipscthiz or discontinuous dynamical systems that converge locally in finite time to the minima of gradient-dominated functions. We introduce three discretization methods for these first-order finite-time flows, and provide convergence guarantees. We then apply the proposed algorithms in training neural networks and empirically test their performances on three standard datasets, namely, CIFAR10, SVHN, and MNIST. Our results show that our schemes demonstrate faster convergences against standard optimization alternatives, while achieving equivalent or better accuracy.

1. INTRODUCTION

Consider the unconstrained minimization problem for a given cost function f : R n → R. When f is sufficiently regular, the standard algorithm in continuous time (dynamical system) is given by ẋ = F GF (x) -∇f (x) with ẋ d dt x(t), known as the gradient flow (GF). Generalizing GF, the q-rescaled GF (q-RGF) Wibisono et al. (2016) given by ẋ = -c ∇f (x) ∇f (x) q-2 q-1 2 (2) with c > 0 and q ∈ (1, ∞] has an asymptotic convergence rate f (x(t)) -f (x ) = O 1 t q-1 under mild regularity, for x(0) -x > 0 small enough, where x ∈ R n denotes a local minimizer of f . However, we recently proved Romero & Benosman (2020) that the q-RGF, as well as our proposed q-signed GF (q-SGF) ẋ = -c ∇f (x) 1 q-1 1 sign(∇f (x)), (3) where sign(•) denotes the sign function, applied element-wise for (real-valued) vectors, are both finite-time convergent, provided that f is gradient dominated of order p ∈ (1, q). In particular, if f is strongly convex, then q-RGF and q-SGF is finite-time convergent for any q ∈ (2, ∞], since f must be gradient dominated of order p = 2.

CONTRIBUTION

In this paper, we explore three discretization schemes for the q-RGF (2) and q-SGF (3) and provide some convergence guarantees using results from hybrid dynamical control theory. In particular, we explore a forward-Euler discretization of RGF/SGF, followed by an explicit Runge-Kutta discretization, and finally a novel Nesterov-like discretization. We then test their performance on both synthetic and real-world data in the context of deep learning, namely, over the well-known datasets CIFAR10, SVHN, and MNIST.

RELATED WORK

Propelled by the work of Wang & Elia (2011) and Su et al. (2014) , there has been a recent and significant research effort dedicated to analyzing optimization algorithms from the perspective of 1

