SKETCHEMBEDNET: LEARNING NOVEL CONCEPTS BY IMITATING DRAWINGS

Abstract

Sketch drawings are an intuitive visual domain that appeals to human instinct. Previous work has shown that recurrent neural networks are capable of producing sketch drawings of a single or few classes at a time. In this work we investigate representations developed by training a generative model to produce sketches from pixel images across many classes in a sketch domain. We find that the embeddings learned by this sketching model are extremely informative for visual tasks and infer a unique visual understanding. We then use them to exceed state-of-the-art performance in unsupervised few-shot classification on the Omniglot and mini-ImageNet benchmarks. We also leverage the generative capacity of our model to produce high quality sketches of novel classes based on just a single example.

1. INTRODUCTION

Upon encountering a novel concept, such as a six-legged turtle, humans can quickly generalize this concept by composing a mental picture. The ability to generate drawings greatly facilitates communicating new ideas. This dates back to the advent of writing, as many ancient written languages are based on logograms, such as Chinese hanzi and Egyptian hieroglyphs, where each character is essentially a sketch of the object it represents. We often see complex visual concepts summarized by a few simple strokes. Inspired by the human ability to draw, recent research has explored the potential to generate sketches using a wide variety of machine learning models, ranging from hierarchical Bayesian models (Lake et al., 2015) , to more recent deep autoregressive models (Gregor et al., 2015; Ha & Eck, 2018; Chen et al., 2017) and generative adversarial nets (GANs) (Li et al., 2019) . It is a natural question to ask whether we can obtain useful intermediate representations from models that produce sketches in the output space, as has been shown by other generative models (Ranzato et al., 2006; Kingma & Welling, 2014; Goodfellow et al., 2014; Donahue et al., 2017; Doersch et al., 2015) . Unfortunately, a hierarchical Bayesian model suffers from prolonged inference time, while other current sketch models mostly focus on producing drawings in a closed set setting with a few classes (Ha & Eck, 2018; Chen et al., 2017) , or on improving log likelihood at the pixel level (Rezende et al., 2016) . Leveraging the learned representation from these drawing models remains a rather unexplored topic. In this paper, we pose the following question: Can we learn a generalized embedding function that captures salient and compositional features by directly imitating human sketches? The answer is affirmative. In our experiments we develop SketchEmbedNet, an RNN-based sketch model trained to map grayscale and natural image pixels to the sketch domain. It is trained on hundreds of classes without the use of class labels to learn a robust drawing model that can sketch diverse and unseen inputs. We demonstrate salience by achieving state-of-the-art performance on the Omniglot few-shot classification benchmark and visual recognizability in one-shot generations. Then we explore how the embeddings capture image components and their spatial relationships to explore image space compositionality and also show a surprising property of conceptual composition. We then push the boundary further by applying our sketch model to natural images-to our knowledge, we are the first to extend stroke-based autoregressive models to produce drawings of open domain natural images. We train our model with adapted SVG images from the Sketchy dataset (Sangkloy et al., 2016) and then evaluate the embedding quality directly on unseen classes in the mini-ImageNet task for few-shot classification (Vinyals et al., 2016) . Our approach is competitive with existing unsupervised few-shot learning methods (Hsu et al., 2019; Khodadadeh et al., 2019; Antoniou & Storkey, 2019) on this natural image benchmark. In both the sketch and natural image domain, we show that by learning to draw, our methods generalize well even across different datasets and classes. 1

