EFFICIENT HYPERPARAMETER OPTIMISATION THROUGH TENSOR COMPLETION

Abstract

Hyperparameter optimisation is a prerequisite for state-of-the-art performance in machine learning, with current strategies including Bayesian optimisation, hyperband, and evolutionary methods. While such methods have been shown to improve performance, none of these is designed to explicitly take advantage of the underlying data structure. To this end, we introduce a completely different approach for hyperparameter optimisation, based on low-rank tensor completion. This is achieved by first forming a multi-dimensional tensor which comprises performance scores for different combinations of hyperparameters. Based on the realistic underlying assumption that the so-formed tensor has a low-rank structure, reliable estimates of the unobserved validation scores of combinations of hyperparameters can be obtained through tensor completion, from knowing only a fraction of the elements in the tensor. Through extensive experimentation on various datasets and learning models, the proposed method is shown to exhibit competitive or superior performance to state-of-the-art hyperparameter optimisation strategies. Distinctive advantages of the proposed method include its ability to simultaneously handle any hyperparameter type (e.g., kind of optimiser, number of neurons, number of layer, etc.), its relative simplicity compared to competing methods, as well as the ability to suggest multiple optimal combinations of hyperparameters.

1. INTRODUCTION

Machine learning (ML) applications have been steadily growing in number and scope over recent years, especially in Computer Vision and Natural Language Processing. This growth is mainly attributed to the ability of large deep learning (DL) models to learn very complex functions. Consequently, the performance of such models (but even of simpler models that perform less computationally heavy tasks) is critically dependent on fine-tuning of their internal hyperparameters. Given the importance of hyperparameter optimisation and the ever increasing complexity of training modern ML models, efficient tuning of hyperparameters has become an area of utmost importance, not only in obtaining state-of-the-art performance, but also in alleviating the complexity of the training process. It is therefore not surprising that a large effort of the machine learning community has been focused on developing efficient hyperparameter optimisation methods. Commonly used methods include grid search, random search, methods based on Bayesian optimisation, multi-fidelity optimisation and evolutionary strategies. For a comprehensive review of currently available hyperparameter optimisation methods we refer the reader to Yu & Zhu (2020). Despite success, current approaches are either rather heuristic or depend on strong underlying assumptions. To this end, we introduce a radically different approach based on exploiting the low-rank structure of the hyperparameter space. For example, we expect that a given optimiser with its learning rate set to 1e -3 will most likely yield similar validation loss performance to a learning rate of 9.9e -4. Extending this argument to multiple dimensions, so as to reflect multiple hyperparameters, we aim to identify highly promising subspaces in the vast space of hyperparameter combinations by evaluating only a small fraction of these combinations. More specifically, the proposed method models the set of all possible hyperparameter combinations as a multidimensional tensor. Each entry in this tensor corresponds to a score indicating relative suitability of a particular hyperparameter combination with respect to others (e.g., validation loss). By evaluating a subset of these combinations, we construct an incomplete tensor with only a fraction of its elements known. Assuming

