ATOMIZED DEEP LEARNING MODELS

Abstract

Deep learning models often tackle the intra-sample structure, such as the order of words in a sentence and pixels in an image, but have not pay much attention to the inter-sample relationship. In this paper, we show that explicitly modeling the intersample structure to be more discretized can potentially help model's expressivity. We propose a novel method, Atom Modeling, that can discretize a continuous latent space by drawing an analogy between a data point and an atom, which is naturally spaced away from other atoms with distances depending on their intra structures. Specifically, we model each data point as an atom composed of electrons, protons, and neutrons and minimize the potential energy caused by the interatomic force among data points. Through experiments with qualitative analysis in our proposed Atom Modeling on synthetic and real datasets, we find that Atom Modeling can improve the performance by maintaining the inter-sample relation and can capture an interpretable intra-sample relation by mapping each component in a data point to electron/proton/neutron.

1. INTRODUCTION

Multiple widely used neural networks are composed of two parts: the first part projects data points into another space, and the other part of the model does further regression/classification upon this space. By transforming raw data features to another potentially more tractable space, deep learning models have recently shown potential in many areas, ranging from dialogue systems (Vinyals & Le, 2015; López et al., 2017; Chen et al., 2017) , medical image analysis (Kononenko, 2001; Ker et al., 2017; Erickson et al., 2017; Litjens et al., 2017; Razzak et al., 2018; Bakator & Radosav, 2018) to robotics (Peters et al., 2003; Kober et al., 2013; Pierson & Gashler, 2017; Sünderhauf et al., 2018) . One major challenge in deep learning is to model better intra-and inter-sample structures for complex data features. Recent works often model the intra-sample structure by considering the order and adjacency of the input features, for instance, positional encoding for texts/speech in Transformers (Vaswani et al., 2017) and kernel width for images in convolution neural networks (LeCun et al., 2015) . Regarding the inter-sample structure, literature often assumes that a dataset can be represented in a continuous space and an interpolation of two embeddings might be meaningful (Bowman et al., 2016; Chen et al., 2016) , while the data might be naturally discrete (van den Oord et al., 2017) . Moreover, the mainstream relies on a non-fully transparent optimization function to reorganize the space. Therefore, in this work, we would like to explore how to explicitly, dynamically rearrange the space (inter-sample structure) by leveraging the intra-sample structures. Inspired by Atomic Physics, where atom is the smallest unit of matter and meanwhile discretely distributed, we propose that we can model a data point as an atom. As illustrated in the left of Figure 2 , an atom in a Bohr model (Bohr, 1913) , an often adopted concept in Physics (Halliday et al., 2013) and Chemistry (Brown, 2009) , contains a dense nucleus, which is composed of the positively charged protons and uncharged neutrons, surrounded by orbiting negatively charged electrons with a nucleus radius. Further, multiple atoms can have interatomic forces, composed of attractive and repulsive forces, that make the atoms distant away (non-zero). Such interatomic forces are also the reason for the atoms to form molecules, crystals and metals in our observable life. In this paper, we propose Atom Modeling, a science-and theoretically-based method that explicitly model the intra-sample relation via atomic structure and the inter-sample relation via interactomic forces. Specifically, we consider a data point as an atom and let a model automatically learn the mapping of each component in a data point to an electron, a proton, or a neutron. We then estimate

