TABULAR DATA TO IMAGE GENERATION: BENCHMARK DATA, APPROACHES, AND EVALUATION

Abstract

In this work, we study the problem of generating a set of images from an arbitrary tabular dataset. The set of generated images provides an intuitive visual summary of the tabular data that can be quickly and easily communicated and understood by the user. More specifically, we formally introduce this new dataset to image generation task and discuss a few motivating applications including exploratory data analysis and understanding customer segments for creating better marketing campaigns. We then curate a benchmark dataset for training such models, which we release publicly for others to use and develop new models for other important applications of interest. Further, we describe a general and flexible framework that serves as a fundamental basis for studying and developing models for this new task of generating images from tabular data. From the framework, we propose a few different approaches with varying levels of complexity and tradeoffs. One such approach leverages both numerical and textual data as the input to our image generation pipeline. The pipeline consists of an image decoder and a conditional auto-regressive sequence generation model which also includes a pre-trained tabular representation in the input layer. We evaluate the performance of these approaches through several quantitative metrics (FID for image quality and LPIPS scores for image diversity).

1. INTRODUCTION

In recent years, conditional image generation has been one of the most important directions in the line of research for generative models, due to both its technical challenges and numerous potential applications of such technology. However, most of the works that investigates conditional generation of images consider images (Gatys et al., 2016) or text (Ramesh et al., 2021; 2022) as input. Based on this fact, it is natural to ask whether we can extend the input to other data types to discover the potential benefit of image generation models to a larger variety of domains. Hence, in this work, we study the possibility of generating images from a given tabular data, motivated by its promising capability to be applied in customer segmentation analysis for marketers and exploratory data analysis. More specifically, we consider the following problem. Given a (tabular) dataset, or more generally a subset of rows and columns of the datasetfoot_0 , how can we automatically generate a set of high quality images that describe it? Additionally, the set of images generated from the dataset should characterize the key trends, patterns, and segments (clusters) in the data. Such image generation model would yield interesting possibilities. For instance, suppose we have a dataset of customers and the items they purchased, then instead of performing a thorough data mining that requires intensive technical expertise, an image that illustrates a specific segment of customers purchasing specific items already reveals valuable information about the underlying traits of consumer behaviors, which can be easily used in future targeted marketing campaigns. Hence, tabular-data-to-image generation can likely bring up interesting usage. Tabular data to image generation has many important and practical applications. One important application is as a fundamental tool for exploratory data analysis. Consider a user that is interactively



For convenience, the term dataset is used to refer to a subset of rows and columns from a dataset as well as the full dataset.

