MULTI-TASK MULTICRITERIA HYPERPARAMETER OPTIMIZATION

Abstract

We present a new method for searching optimal hyperparameters among several tasks and several criteria. Multi-Task Multi Criteria method (MTMC) provides several Pareto-optimal solutions, among which one solution is selected with given criteria significance coefficients. The article begins with a mathematical formulation of the problem of choosing optimal hyperparameters. Then, the steps of the MTMC method that solves this problem are described. The proposed method is evaluated on the image classification problem using a convolutional neural network. The article presents optimal hyperparameters for various criteria significance coefficients.

1. INTRODUCTION

Hyperparameter optimization (Hutter et al., 2009) is an important component in the implementation of machine learning models (for example, logistic regression, neural networks, SVM, gradient boosting, etc.) in solving various tasks, such as classification, regression, ranking, etc. The problem is how to choose the optimal parameters when a trained model is evaluated using several sets and several criteria. This article describes a method to solving the above problem. We will present the results of experiments on the selection of hyperparameters obtained using the proposed approach (MTMC) with various criteria significance coefficients. The article is organized as follows. First, we discuss related work in Section 2. Section 3 describes the proposed method. Section 4 presents the results of experiments on the selection of optimal hyperparameters. Section 5 contains the conclusion and future work.

2. RELATED WORK

Hyperparameter optimization is applied to solve various problems such as computer vision (Bergstra et al., 2013; Dong et al., 2019 ), robotics (Mahmood et al., 2018; Tran et al., 2020) , natural language processing (Wang et al., 2015; Dernoncourt & Lee, 2016) and speech synthesis (Koriyama et al., 2014) . The problem of choosing optimal hyperparameters has long been known. Existing solutions can be considered for the following features: 1. Number of optimal solutions. 2. Number of tasks to be solved.

3.. Number of criteria for choosing the optimal solution.

In this article, task means a set of images with a number of classes N classes and a number of images N images . There are examples of the same classes between tasks, the difference is how the images are made (different lighting, background and used cameras). Criteria is a quantitative characteristic of training / evaluation a neural network on a task (e.g. accuracy, latency or epoch of training convergence). In (Sener & Koltun, 2018) , a Pareto optimization method is proposed, in which the optimal solution is given for several problems simultaneously. This method consists in minimizing the weighted sum of loss functions for each task. (Fliege & Svaiter, 2000) describes the Pareto optimization method, which gives an optimal solution according to several criteria based on gradient descent, and this optimization is also carried out in the learning process. In (Igel, 2005) , the search for a Pareto-optimal solution is carried out according to several criteria. (Miettinen, 2012) gives several methods for multiobjective Pareto-optimization. The method in (Bengio, 2000) gives optimal hyperparameters using back propagation through the Cholesky decomposition. In (Bergstra et al., 2011) , optimization is performed using a random choice of hyperparameters based on the expected improvement criterion. (Bergstra & Bengio, 2012) proposes method of hyperparametric optimization based on random search in the space of hyperparameters. In (Snoek et al., 2012) , search for optimal hyperparameters is carried out using Bayesian optimization. (Swersky et al., 2013) proposes a method for finding optimal hyperparameters using multi-task Bayesian optimization. (Paria et al., 2020; Hernández-Lobato et al., 2016) describe the Bayesian methods for multi-objective optimization. The proposed method based on Pareto optimization. Pareto optimality means that it is impossible to improve the Pareto optimal solution by any criteria without worsening it by at least one other criteria. Thus, for a certain set of values of the criteria, the Pareto-selected solutions are optimal. The closest criteria to the given criteria are determined by finding the minimum weighted sum of Pareto solutions (where the weight is the inverse criteria). The novelty of MTMC method: 1. Optimization is carried out simultaneously according to several criteria and several tasks with setting the significance of the criteria. 2. The choice of optimal hyperparameters is provided after training and evaluation, which eliminates the need to re-train the model. 3. The proposed method does not need to be trained.

3. THE PROPOSED METHOD

First, we describe the mathematical problem that MTMC solves, then we present the steps performed in MTMC.

3.1. FORMALIZATION OF THE PROBLEM

In the proposed method, the model is evaluated on several test sets (tasks) T . The problem of finding a minimum for tasks T is known as minimizing the expected value of empirical risk (Vapnik, 1992) . The choosing optimal hyperparameters is formalized as follows: θ = argmin θ∈Θ E τ [L(θ, φ)] ( ) where Θ is the set of all hyperparameters, θ is the selected optimal hyperparameters, φ is the vector of significance coefficients of the criteria, L(•) is the estimation function of the model with the given hyperparameters θ and the coefficients φ, τ is the task for which optimization is performed. The developed method gives a solution to the problem (1).

3.2. DESCRIPTION OF MTMC

According to (1), the developed method should fulfill the following requirements: 1) the method should solve the minimization problem; 2) the significance of each criterion is determined by the vector of coefficients φ (the higher the coefficient, the more important the corresponding criterion). We denote the test sample of the task τ : x i ∼ D, i = 1 . . . N task (2)

