INTERPRETING AND BOOSTING DROPOUT FROM A GAME-THEORETIC VIEW

Abstract

This paper aims to understand and improve the utility of the dropout operation from the perspective of game-theoretic interactions. We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretic proof is also verified by various experiments. Furthermore, we find that such interactions were strongly related to the over-fitting problem in deep learning. Thus, the utility of dropout can be regarded as decreasing interactions to alleviate the significance of over-fitting. Based on this understanding, we propose an interaction loss to further improve the utility of dropout. Experimental results have shown that the interaction loss can effectively improve the utility of dropout and boost the performance of DNNs.

1. INTRODUCTION

Deep neural networks (DNNs) have exhibited significant success in various tasks, but the overfitting problem is still a considerable challenge for deep learning. Dropout is usually considered as an effective operation to alleviate the over-fitting problem of DNNs. Hinton et al. (2012); Srivastava et al. (2014) thought that dropout could encourage each unit in an intermediate-layer feature to model useful information without much dependence on other units. Konda et al. ( 2016) considered dropout as a specific method of data augmentation. Gal & Ghahramani (2016) proved that dropout was equivalent to the Bayesian approximation in a Gaussian process. Our research group led by Dr. Quanshi Zhang has proposed game-theoretic interactions, including interactions of different orders (Zhang et al., 2020) and multivariate interactions (Zhang et al., 2021b) . As a basic metric, the interaction can be used to explain signal-processing behaviors in trained DNNs from different perspectives. For example, we have built up a tree structure to explain hierarchical interactions between words encoded in NLP models (Zhang et al., 2021a) . We also prove a close relationship between the interaction and the adversarial robustness (Ren et al., 2021) and transferability (Wang et al., 2020) . Many previous methods of boosting adversarial transferability can be explained as the reduction of interactions, and the interaction can also explain the utility of the adversarial training (Ren et al., 2021) . As an extension of the system of game-theoretic interactions, in this paper, we aim to explain, model, and improve the utility of dropout from the following perspectives. First, we prove that the dropout

