NOT-MIWAE: DEEP GENERATIVE MODELLING WITH MISSING NOT AT RANDOM DATA

Abstract

When a missing process depends on the missing values themselves, it needs to be explicitly modelled and taken into account while doing likelihood-based inference. We present an approach for building and fitting deep latent variable models (DLVMs) in cases where the missing process is dependent on the missing data. Specifically, a deep neural network enables us to flexibly model the conditional distribution of the missingness pattern given the data. This allows for incorporating prior information about the type of missingness (e.g. self-censoring) into the model. Our inference technique, based on importance-weighted variational inference, involves maximising a lower bound of the joint likelihood. Stochastic gradients of the bound are obtained by using the reparameterisation trick both in latent space and data space. We show on various kinds of data sets and missingness patterns that explicitly modelling the missing process can be invaluable.

1. INTRODUCTION

Missing data often constitute systemic issues in real-world data analysis, and can be an integral part of some fields, e.g. recommender systems. This requires the analyst to take action by either using methods and models that are applicable to incomplete data or by performing imputations of the missing data before applying models requiring complete data. The expected model performance (often measured in terms of imputation error or innocuity of missingness on the inference results) depends on the assumptions made about the missing mechanism and how well those assumptions match the true missing mechanism. In a seminal paper, Rubin (1976) introduced a formal probabilistic framework to assess missing mechanism assumptions and their consequences. The most commonly used assumption, either implicitly or explicitly, is that a part of the data is missing at random (MAR). Essentially, the MAR assumption means that the missing pattern does not depend on the missing values. This makes it possible to ignore the missing data mechanism in likelihood-based inference by marginalizing over the missing data. The often implicit assumption made in nonprobabilistic models and ad-hoc methods is that the data are missing completely at random (MCAR). MCAR is a stronger assumption than MAR, and informally it means that both observed and missing data do not depend on the missing pattern. More details on these assumptions can be found in the monograph of Little & Rubin (2002) ; of particular interest are also the recent revisits of Seaman et al. (2013) and Doretti et al. (2018) . In this paper, our goal is to posit statistical models that leverage deep learning in order to break away from these assumptions. Specifically, we propose a general



Figure 1: (a) Graphical model of the not-MIWAE. (b) Gaussian data with MNAR values. Dots are fully observed, partially observed data are displayed as black crosses. A contour of the true distribution is shown together with directions found by PPCA and not-MIWAE with a PPCA decoder.

