N-STUDENT LEARNING: AN APPROACH TO MODEL UNCERTAINTY AND COMBAT OVERFITTING

Abstract

This work presents N-Student Learning, a pseudo-label based multi-network training setup that can be applied to nearly any supervised learning architecture in order to help combat the problem of overfitting and control the way in which a network models uncertainty in the data. The effectiveness of N-Student Learning relies on the idea that a network's predictions on unseen data are largely independent of any instance-dependent noise in the labels. In N-Student Learning, each student network is assigned a subset of the training dataset such that no data point is in every student's training subset. Unbiased pseudo-labels can thus be generated for every data point in the training set by taking the predictions of appropriate student networks. Training on these unbiased pseudo-labels minimizes the extent to which each network overfits to instance-dependent noise in the data. Furthermore, based on prior knowledge of the domain, we can control how the networks learn to model uncertainty that is present in the dataset by adjusting the way that pseudolabels are generated. While this method is largely inspired by the general problem of overfitting, a natural application is found in the problem of classification with noisy labels -a domain where overfitting is a significant concern. After developing intuition through a toy classification task, we proceed to demonstrate that N-Student Learning performs favorably on benchmark datasets when compared to state-of-the-art methods in the problem of classification with noisy labels.

1. INTRODUCTION

Overfitting is a fundamental problem in supervised classification in which a model learns properties of the training data that do not generalize to unseen data. If samples from the input space contain information that is not relevant to the task, the model may overfit by learning to use these irrelevant details for predictive purposes. Overfitting may also occur as a result of noise in the label space, in which the label provided in the dataset does not match the expected output of the model given the corresponding input. Overfitting due to label noise in its various forms will be a primary focus of this paper. In this paper, we introduce N-Student Learning, a pseudo-label based multi-network training setup which mitigates overfitting to noise in the labels. The idea is to relabel the dataset by taking the predictions of networks that have never seen the data. We do this by training multiple networks on different subsets of the data so that the pseudo-labels that they generate on their respective unseen subsets will be clean of any instance-dependent noise. Training on these pseudo-labels results in networks that are less prone to overfitting. After introducing the architecture in section 2, we will discuss a few types of label noise that are commonly present in datasets. Using a toy classification problem in section 3, we show the effect of the N-Student Learning setup and demonstrate that the setup can be adapted to handle different kinds of noise. Following this, in section 4, we show that N-Student Learning performs favorably when compared to state-of-the-art methods on both artificially noisy and naturally noisy benchmark datasets.

