INTRODUCING SAMPLE ROBUSTNESS

Abstract

Choosing the right data and model for a pre-defined task is one of the critical competencies in machine learning. Investigating what features of a dataset and its underlying distribution a model decodes may enlighten the mysterious "black box" and guide us to a deeper and more profound understanding of the ongoing processes. Furthermore, it will help to improve the quality of models which directly depend on data or learn from it through training. In this work, we introduce the dataset-dependent concept of sample robustness, which is based on a pointwise Lipschitz constant of the label map. For a particular sample, it measures how small of a perturbation is required to cause a label-change relative to the magnitude of the label map. We introduce theory to motivate the concept and to analyse the effects of having similar robustness distributions for the training-and test data. Afterwards, we conduct various experiments using different datasets and (non-)deterministic models. In some cases, we can boost performance by choosing specifically tailored training(sub)sets and hyperparameters depending on the robustness distribution of the test(sub)sets.

1. INTRODUCTION:

In the age of automated machine learning, we shift our focus evermore towards regarding metahyperparameters such as model-type or training-and validation budget as variables of a loss function in the most abstract sense. For training sets, however, the mere number of samples often determines how well suited it is perceived for a particular task. The motivation of this work is to introduce a concept allowing for the use of datasets as variables of such a generalised loss function as well. Imagine, for example, patients who share almost identical medical records, but react differently to some prescribed treatment. They may pose a challenge to machine learning models similar to what is known as natural adversarial examples in vision tasks, see Hendrycks et al. (2019) . The relationship between medical features and appropriate personal treatments may be sensitive towards small input variations, i.e. not robust towards perturbations. In this work, we are to the best of our knowledge the first to introduce and analyse a model-agnostic measure for the robustness of data. We show that knowledge about the robustness distribution of a specific test(sub)set can allow for choosing a more appropriate training(sub)set in terms of performance optimisation. Finally, we discover that the optimal choice of hyperparameters may also depend on the robustness distributions of both training-and test data. Let us motivate the concept of sample robustness first on a high level. When collecting and processing a dataset for a pre-defined task, we identify certain features and expressive samples such that a model may be able to abstract and generalise from these finite points to the whole space of possible inputs. Assume we have a certain rectangle-shaped data-distribution in a circle-shaped feature space and a dataset labelled according to two distinct ground truth maps y * ∈ {×, * } and z * ∈ {•, } (comp. Figure 1 ). Here, one can imagine classifying images of horses and cats (assuming ground truth y * ) and classifying images of animal-human pairs (assuming ground truth z * ). Evidently, the distance between differently labelled samples depends on the ground truth map labelling them. For every sample in a dataset, the intrinsic information of closeness to a differently labelled sample can be considered a feature itself. For regression tasks and label distributions which are not necessarily categorical one may also include the distance of the corresponding labels as additional information. By taking the quotient of these two and maximising it over the dataset, i.e. calculating a point-wise Lipschitz constant of the label map, one can measure how sensitive a sample is to label-

