DEEP QUOTIENT MANIFOLD MODELING

Abstract

One of the difficulties in modeling real-world data is their complex multi-manifold structure due to discrete features. In this paper, we propose quotient manifold modeling (QMM), a new data-modeling scheme that considers generic manifold structure independent of discrete features, thereby deriving efficiency in modeling and allowing generalization over untrained manifolds. QMM considers a deep encoder inducing an equivalence between manifolds; but we show it is sufficient to consider it only implicitly via a bias-regularizer we derive. This makes QMM easily applicable to existing models such as GANs and VAEs, and experiments show that these models not only present superior FID scores but also make good generalizations across different datasets. In particular, we demonstrate an MNIST model that synthesizes EMNIST alphabets.

1. INTRODUCTION

Real-world data are usually considered to involve a multi-manifold structure by having discrete features as well as continuous features; continuous features such as size or location induce a smooth manifold structure in general, whereas discrete features such as digit-class or a new object in the background induce disconnections in the structure, making it a set of disjoint manifolds instead of a single (Khayatkhoei et al., 2018) . While this multiplicity makes modeling data a difficult problem, recently proposed deep generative models showed notable progresses by considering each manifold separately. Extending the conventional models by using multiple generators (Khayatkhoei et al., 2018; Ghosh et al., 2017; Hoang et al., 2018) , discrete latent variables (Chen et al., 2016; Dupont, 2018; Jeong and Song, 2019) , or mixture densities (Gurumurthy et al., 2017; Xiao et al., 2018; Tomczak and Welling, 2018) , they exhibit improved performances in image generations and in learning high-level features. There are, however, two additional properties little considered by these models. First, since discrete features are both common and combinatorial, there can be exponentially many manifolds that are not included in the dataset. For example, an image dataset of a cat playing around in a room would exhibit a simple manifold structure according to the locations of the cat, but there are also numerous other manifolds derivable from it via discrete variations-such as placing a new chair, displacing a toy, turning on a light or their combinations-that are not included in the dataset (see Fig. 1 ). Second, while the manifolds to model are numerous considering such variations, they usually have the same generic structure since the underlying continuous features remain the same; regardless of the chair, toy, or light, the manifold structures are equally due to the location of the cat. Considering these properties, desired is a model that can handle a large number of resembling manifolds, but the aforementioned models show several inefficiencies. They need proportionally many generators or mixture components to model a large number of manifolds; each of them requires much data, only to learn the manifolds having the same generic structure. Moreover, even if they are successfully trained, new discrete changes are very easy to be made, yet they cannot generalize beyond the trained manifolds. In this paper, we propose quotient manifold modeling (QMM)-a new generative modeling scheme that considers generic manifold structure independent of discrete features, thereby deriving efficiency in modeling and allowing generalization over untrained manifolds. QMM outwardly follows the multi-generator scheme (Khayatkhoei et al., 2018; Ghosh et al., 2017; Hoang et al., 2018) ; but it involves a new regularizer that enforces encoder compatibility-a condition that the inverse maps of the generators to be presented by a single deep encoder. Since deep encoders usually exhibit Figure 1 : An illustration of quotient manifold modeling (QMM). Images of a cat moving around in a room would form an 1-D manifold due to the location of the cat; but the structure can become multi-manifold by having different discrete features (a chair or a vase in the background). The idea of QMM is to consider the generic structure shared by the manifolds using an encoding map. The encoding map induces an equivalence relation that can be seen as a contour map (green dotted curves). The quotient of the relation (or the orthogonal curve to the map) gives the manifold structure for an untrained image (shown in purple, dotted). good generalizability, this condition not only makes a generic structure be shared among the generators but also makes it generalizable to untrained manifolds. In particular, it induces a generalizable equivalence relation between data, and the manifold structure of out-of-sample data can be derived by taking the quotient of this relation, hence the name QMM. Since the implementation of QMM is essentially adding a regularizer, it can be easily applied to existing deep generative models such as generative adversarial networks (GANs; (Goodfellow et al., 2014) ), variational auto-encoders (VAEs; (Kingma and Welling, 2013)), and their extensions. We demonstrate that these QMM-applied models not only show better FID scores but also show good generalizations. Our contributions can be summarized as follows: • We propose QMM, a new generative modeling scheme that considers generic manifold structure, thereby allowing generalizations over untrained manifolds. • We derive a regularizer enforcing encoder compatibility, an essential condition for QMM. • We show that GANs and VAEs implementing QMM show superior FID scores and generalize across different datasets.

2. BACKGROUND 2.1 MANIFOLD MODELING IN GANS AND VAES

While generative adversarial networks (GANs) (Goodfellow et al., 2014) and variational autoencoders (VAEs) (Kingma and Welling, 2013) are two different strands of models, they have the same data-modeling scheme that leads to a manifold structure (though VAEs involve stochasticity). They model data x ∈ X as a transformation of a low-dimensional latent code z ∈ Z via a generative (decoding) map f G : Z → X, which makes every datum they consider lie on a subspace M = f G (Z) ⊂ X. Since f G can be assumed to be smooth and injective in practice (Shao et al., 2017) , M accords with the mathematical definition of a smooth manifold. But to deal with multi-manifold data, these models need to approximate disconnections in the structure with low densities. This requires them to have a highly nonlinear f G , which is difficult to learn and often leads to either a low-quality model or a mode collapse (Khayatkhoei et al., 2018) .

