A CHAIN GRAPH INTERPRETATION OF REAL-WORLD NEURAL NETWORKS Anonymous authors Paper under double-blind review

Abstract

The last decade has witnessed a boom of deep learning research and applications achieving state-of-the-art results in various domains. However, most advances have been established empirically, and their theoretical analysis remains lacking. One major issue is that our current interpretation of neural networks (NNs) as function approximators is too generic to support in-depth analysis. In this paper, we remedy this by proposing an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure. The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models, while at the same time remains general enough to cover real-world NNs with arbitrary depth, multibranching and varied activations, as well as common structures including convolution / recurrent layers, residual block and dropout. We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques, as well as derive new deep learning approaches such as the concept of partially collapsed feed-forward inference. It is thus a promising framework that deepens our understanding of neural networks and provides a coherent theoretical formulation for future deep learning research.

1. INTRODUCTION

During the last decade, deep learning (Goodfellow et al., 2016) , the study of neural networks (NNs), has achieved ground-breaking results in diverse areas such as computer vision (Krizhevsky et al., 2012; He et al., 2016; Long et al., 2015; Chen et al., 2018) , natural language processing (Hinton et al., 2012; Vaswani et al., 2017; Devlin et al., 2019) , generative modeling (Kingma & Welling, 2014; Goodfellow et al., 2014) and reinforcement learning (Mnih et al., 2015; Silver et al., 2016) , and various network designs have been proposed. However, neural networks have been treated largely as "black-box" function approximators, and their designs have chiefly been found via trialand-error, with little or no theoretical justification. A major cause that hinders the theoretical analysis is the current overly generic modeling of neural networks as function approximators: simply interpreting a neural network as a composition of parametrized functions provides little insight to decipher the nature of its components or its behavior during the learning process. In this paper, we show that a neural network can actually be interpreted as a probabilistic graphical model (PGM) called chain graph (CG) (Koller & Friedman, 2009) , and feed-forward as an efficient approximate probabilistic inference on it. This offers specific interpretations for various neural network components, allowing for in-depth theoretical analysis and derivation of new approaches.

1.1. RELATED WORK

In terms of theoretical understanding of neural networks, a well known result based on the function approximator view is the universal approximation theorem (Goodfellow et al., 2016) , however it only establishes the representational power of NNs. Also, there have been many efforts on alternative NN interpretations. One prominent approach identifies infinite width NNs as Gaussian processes (Neal, 1996; Lee et al., 2018) , enabling kernel method analysis (Jacot et al., 2018) . Other works also employ theories such as optimal transport (Genevay et al., 2017; Chizat & Bach, 2018) or mean field (Mei et al., 2019) . These approaches lead to interesting findings, however they tend to only hold under limited or unrealistic settings and have difficulties interpreting practical real-world NNs.

