TEACHING WITH COMMENTARIES

Abstract

Effective training of deep neural networks can be challenging, and there remain many open questions on how to best learn these models. Recently developed methods to improve neural network training examine teaching: providing learned information during the training process to improve downstream model performance. In this paper, we take steps towards extending the scope of teaching. We propose a flexible teaching framework using commentaries, learned meta-information helpful for training on a particular task. We present gradient-based methods to learn commentaries, leveraging recent work on implicit differentiation for scalability. We explore diverse applications of commentaries, from weighting training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. We find that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process. We also observe that commentaries generalise: they can be reused when training new models to obtain performance benefits, suggesting a use-case where commentaries are stored with a dataset and leveraged in future for improved model training.

1. INTRODUCTION

Training, regularising, and understanding complex neural network models is challenging. There remain central open questions on making training faster and more data-efficient (Kornblith et al., 2019; Raghu et al., 2019a; b) , ensuring better generalisation (Zhang et al., 2016) and improving transparency and robustness (Bau et al., 2017; Madry et al., 2017) . A promising approach for addressing these questions is learning to teach (Zhu, 2015) , in which learned auxiliary information about a task is provided to a neural network to inform the training process and help downstream objectives. Examples include providing auxiliary training targets (Liu et al., 2019; Navon et al., 2020; Pham et al., 2020) and reweighting training examples to emphasise important datapoints (Fan et al., 2020; Jiang et al., 2018; Ren et al., 2018; Shu et al., 2019) . Learning to teach approaches have achieved promising results in vision and language applications (Jiang et al., 2018; Ren et al., 2018; Shu et al., 2019; Hu et al., 2019) using a handful of specific modifications to the training process. In this paper, we take steps towards generalising these approaches, introducing a flexible and effective learning to teach framework using commentaries. Commentaries represent learned meta-information helpful for training a model on a task, and once learned, such commentaries can be reused as is to improve the training of new models. We demonstrate that commentaries can be used for applications ranging from speeding up training to gaining insights into the neural network model. Specifically, our contributions are: 1. We formalise the notion of commentaries, providing a unified framework for learning metainformation that can be used to improve network training and examine model learning. 2. We present gradient-based methods to learn commentaries by optimising a network's validation loss, leveraging recent work in implicit differentiation to scale to larger models. 3. We use commentaries to define example-weighting curricula, a common method of teaching neural networks. We show that these learned commentaries hold interpretable insights, lead to speedups in training, and improve performance on few-shot learning tasks. * Work done while interning at Google. 1

