EFFICIENT HYPERPARAMETER OPTIMISATION THROUGH TENSOR COMPLETION

Abstract

Hyperparameter optimisation is a prerequisite for state-of-the-art performance in machine learning, with current strategies including Bayesian optimisation, hyperband, and evolutionary methods. While such methods have been shown to improve performance, none of these is designed to explicitly take advantage of the underlying data structure. To this end, we introduce a completely different approach for hyperparameter optimisation, based on low-rank tensor completion. This is achieved by first forming a multi-dimensional tensor which comprises performance scores for different combinations of hyperparameters. Based on the realistic underlying assumption that the so-formed tensor has a low-rank structure, reliable estimates of the unobserved validation scores of combinations of hyperparameters can be obtained through tensor completion, from knowing only a fraction of the elements in the tensor. Through extensive experimentation on various datasets and learning models, the proposed method is shown to exhibit competitive or superior performance to state-of-the-art hyperparameter optimisation strategies. Distinctive advantages of the proposed method include its ability to simultaneously handle any hyperparameter type (e.g., kind of optimiser, number of neurons, number of layer, etc.), its relative simplicity compared to competing methods, as well as the ability to suggest multiple optimal combinations of hyperparameters.

1. INTRODUCTION

Machine learning (ML) applications have been steadily growing in number and scope over recent years, especially in Computer Vision and Natural Language Processing. This growth is mainly attributed to the ability of large deep learning (DL) models to learn very complex functions. Consequently, the performance of such models (but even of simpler models that perform less computationally heavy tasks) is critically dependent on fine-tuning of their internal hyperparameters. Given the importance of hyperparameter optimisation and the ever increasing complexity of training modern ML models, efficient tuning of hyperparameters has become an area of utmost importance, not only in obtaining state-of-the-art performance, but also in alleviating the complexity of the training process. It is therefore not surprising that a large effort of the machine learning community has been focused on developing efficient hyperparameter optimisation methods. Commonly used methods include grid search, random search, methods based on Bayesian optimisation, multi-fidelity optimisation and evolutionary strategies. For a comprehensive review of currently available hyperparameter optimisation methods we refer the reader to Yu & Zhu (2020). Despite success, current approaches are either rather heuristic or depend on strong underlying assumptions. To this end, we introduce a radically different approach based on exploiting the low-rank structure of the hyperparameter space. For example, we expect that a given optimiser with its learning rate set to 1e -3 will most likely yield similar validation loss performance to a learning rate of 9.9e -4. Extending this argument to multiple dimensions, so as to reflect multiple hyperparameters, we aim to identify highly promising subspaces in the vast space of hyperparameter combinations by evaluating only a small fraction of these combinations. More specifically, the proposed method models the set of all possible hyperparameter combinations as a multidimensional tensor. Each entry in this tensor corresponds to a score indicating relative suitability of a particular hyperparameter combination with respect to others (e.g., validation loss). By evaluating a subset of these combinations, we construct an incomplete tensor with only a fraction of its elements known. Assuming the complete tensor is of low rank, the unknown entries can be estimated using low-rank tensor completion techniques. This makes it possible to predict the relative performance of different hyperparameter configurations without the need for their explicit evaluation. The best hyperparameter configurations can then be found by searching the so completed tensor. To take full advantage of the tensor completion framework, we propose a sequential tensor completion algorithm which narrows down the hyperparameter search space based on identified promising subspaces from previous tensor completion cycles. Furthermore, for each cycle, we employ the Cross method (Zhang (2019)), a tensor sampling scheme which ensures that as few hyperparameter evaluations as possible are required for accurate tensor completion. We show that such an approach results in a speed of optimisation that is highly competitive with, and often surpassing, other stateof-the-art hyperparameter optimisation techniques. Comprehensive numerical results and extensive experimentation illustrate the potential of the proposed framework as a competitive and intuitive, yet physically meaningful, alternative option for efficient hyperparameter optimisation. The rest of the paper is organised as follows. Section 2 discusses related work. After briefly discussing key tensor preliminary concepts in Section 3, we present a validation of the assumed lowrank property of the hyperparameter tensor in 4 and the proposed tensor completion algorithm for hyperparameter optimisation in Section 5. Next, comprehensive experimental results are presented in Section 6, followed by the Conclusion in Section 7.

2. RELATED WORK

Deng & Xiao (2022) provide a meta-learning approach based on low-rank tensor completion which allows the optimal hyperparameter configuration in a new machine learning problem to be predicted based on the optimal configurations found in other related problems. Our work is crucially different since it does not require the results of related problems or any other prior knowledge to optimise a given machine learning problem. A plethora of hyperparameter optimisation techniques exist that also do not require prior knowledge of the problem e.g. random search or Bayesian optimisation. However, to the best of our knowledge, there is no such hyperparameter optimisation method based on low-rank tensor completion. Furthermore, we have been unable to find any work hypothesising that the performance distribution of various hyperparameter combinations has an underlying lowrank structure, even though this is a natural assumption for any physically meaningful data structure.

3. TENSOR PRELIMINARIES

Low-rank tensor completion is based on the premise that elements in a low rank tensor exhibit certain interrelationships. This makes it possible to infer values of unknown elements based on the elements whose values are known. Research abounds with algorithms for tensor completion e.g., Bengua et al. (2017 ), Song et al. (2018 ), Liu et al. (2014) or Acar et al. (2011) ; however, most of these algorithms assume there is no prior knowledge of which elements of the true tensor are known. In the strategy proposed by this paper, one is able to "choose" which elements of the tensor are known by choosing which hyperparameter combinations to evaluate. Since each hyperparameter evaluation is time-consuming, it is important to be achieve accurate tensor completion with as few known elements as possible; this capability is provided by the Cross technique for efficient low rank tensor completion from Zhang (2019). To explain this technique, it is necessary to explain two important prerequisites: Tucker decomposition and Tucker rank. Any N -dimensional tensor X can be expressed as a Tucker decomposition, consisting of an N -dimensional core tensor G and N factor matrices A (n) , one for each dimension. The relationship between the tensor X and its Tucker decomposition is X = G × 1 A (1) × 2 A (2) ... × N A (N ) (1) The operation × n denotes the mode-n product. Consider a tensor X ∈ R I1×I2...×I N and matrix A ∈ R J×In , such that 1 ≤ n ≤ N . The n-mode product of X and A is a tensor whose elements are

