CHIRODIFF: MODELLING CHIROGRAPHIC DATA WITH DIFFUSION MODELS

Abstract

Generative modelling over continuous-time geometric constructs, a.k.a chirographic data such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely Denoising Diffusion Probabilistic Models or DDPMs for chirographic data that specifically addresses these flaws. Our model named "CHIRODIFF", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using CHIRODIFF. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.

1. INTRODUCTION

Chirographic data like handwriting, sketches, drawings etc. are ubiquitous in modern day digital contents, thanks to the widespread adoption of touch screen and other interactive devices (e.g. AR/VR sets). While supervised downstream tasks on such data like sketch-based image retrieval (SBIR) (Liu et al., 2020; Pang et al., 2019) , semantic segmentation (Yang et al., 2021; Wang et al., 2020 ), classification (Yu et al., 2015; 2017) continue to flourish due to higher commercial demand, unsupervised generative modelling remains slightly under-explored. Recently however, with the advent of large-scale datasets, generative modelling of chirographic data started to gain traction. Specifically, models have been trained on generic doodles/drawings data (Ha & Eck, 2018) , or more "specialized" entities like fonts (Lopes et al., 2019 ), diagrams (Gervais et al., 2020; Aksan et al., 2020) , SVG Icons (Carlier et al., 2020) etc. Building unconditional neural generative models not only allows understanding the distribution of chirographic data but also enables further downstream tasks (e.g. segmentation, translation) by means of conditioning. 



Figure 1: Unconditional samples from CHIRODIFF trained on VMNIST, KanjiVG and Quick, Draw!.

