PROBABLE DATASET SEARCHING METHOD WITH UN-CERTAIN DATASET INFORMATION IN ADJUSTING AR-CHITECTURE HYPER PARAMETER

Abstract

Different types of tasks with uncertain dataset information are studied because different parts of data may have different difficulties to achieve. For example, in unsupervised learning and domain adaptation, datasets are provided without label information because of the cost of human annotation. In deep learning, adjusting architecture hyper parameters is important for the model performance and is also time consuming, so we try to adjust hyper parameters in two types of uncertain dataset information:1, dataset labels are postponed to be obtained so hyper parameters need to be adjusted without complete dataset information. 2, hyper parameters are adjusted with a subset training dataset since training models with complete training dataset is time consuming. Here, we propose several loss functions to search for probable dataset when the complete dataset information is not obtained. The experiments on 9 real world data demonstrate the performance of our method.

1. INTRODUCTION

In deep learning, most regression data can be represented with the form (X, Y ), where X is the data input and Y is the label. However, different parts of the data may have different difficulties to achieve. For example, in unsupervised learning Barlow (1989) and domain adaptation Wang & Deng (2018) , the label of dataset is assumed hard to be obtained since label usually needs human annotation. These situations could be viewed as making decisions when part of dataset information is uncertain. Another probable situation is that input sample X can be obtained much earlier than label Y is obtained, because human annotation of Y is time consuming or the exact task target of Y is not determined when the input samples are collected. In such situation, the computing resources are assumed to be abundant before the label Y is obtained. In deep learning, architecture hyper parameter setting is an important factor for the performance of a model. Then a question is whether the architecture hyper parameters, corresponding to different network architectures, can be compared and adjusted only with input sample information X. Selecting hyper parameter only with input sample information X could save the time to try different hyper parameters when the label Y is obtained. Input sample information X usually takes more memory space than label information Y , indicating that input sample information X contains more information than label information Y , so predicting the architecture comparison only with input sample information X seems probable. To deal with the uncertain information in a dataset, we propose a probable dataset searching method to predict architecture comparison, where the dataset representation is inspired by the dataset definitions and assumptions in recent neural network convergence works Kohler & Langer (2021); Bauer & Kohler (2019); Schmidt-Hieber (2020); Suzuki (2018); Farrell et al. (2021) . Our method could search probable datasets with provided dataset information such as input sample information X. Concretely, the comparison of two hyper parameters can be predicted by searching for the existence of probable dataset that one architecture is better or worse than another. Here, we use a neural network to approximate the dataset regression function and apply several loss functions to search for the probable dataset that a trained architecture is better than another in testing dataset. An assumption in our method is that the compared architectures should have competitive performance on searched dataset. Empirically, the compared architectures are selected because they perform well

