LMSER-PIX2SEQ: LEARNING STABLE SKETCH REPRE-SENTATIONS FOR SKETCH HEALING

Abstract

Sketch healing aims to recreate a complete sketch from the corrupted one. The sparse and abstract nature of the sketch makes it challenging. The features extracted from the corrupted sketch may be inconsistent with the ones from the corresponding full sketch. In this paper, we present Lmser-pix2seq to learn stable sketch representations against the missing information by employing a Least mean square error reconstruction (Lmser) block, which falls into encoder-decoder paradigm. Taking as input a corrupted sketch, the Lmser encoder computes the embeddings of structural patterns of the input, while the decoder reconstructs the complete sketch from the embeddings. We build bi-directional skip connections between the encoder and the decoder in our Lmser block. The feedback connections enable recurrent paths to receive more information about the reconstructed sketch produced by the decoder, which helps the encoder extract stable sketch features. The features captured by the Lmser block are eventually fed into a recurrent neural network decoder to recreate the sketches. Experimental results show that our Lmser-pix2seq outperforms the state-of-the-art methods in sketch healing, especially when the sketches are heavily masked or corrupted.

1. INTRODUCTION

Humans are able to complete things that are missing in life through their imagination, such as completing blanks, novel sequels and image repairs. Sketch healing task (Su et al., 2020) is one of the related works. Sketch healing is to synthesise a complete sketch that best resembles the partial input (Su et al., 2020; Qi et al., 2022) . Different from the image inpainting task (Pathak et al., 2016) , where photos have rich texture information, freehand sketches are highly abstract and sparse, making sketch healing quite challenging. The way to get a corrupted sketch, proposed by Su et al. (2020) , is to crop several local visual patches from a raster sketch image and drop some of them. This approach results in a corrupted sketch raster image and some remaining visual patches. Conventional sketch generation models (Chen et al., 2017; Zang et al., 2021) that take images as input can be used for sketch healing. However, these models designed for sketch synthesis are not comparable to SketchHealer-1.0 (Su et al., 2020) , which was specifically designed for sketch healing. SketchHealer-1.0 constructs a graphical representation of the sketch by treating patches as nodes and connecting edges based on the nodes' temporal proximity, i.e., the drawing order. The graphic sketch representation realizes the information interaction between different patches in the same sketch, so as to achieve a better effect of healing. Based on SketchHealer-1.0, SketchHealer-2.0 (Qi et al., 2022) considered the relationship between the local reconstruction and the global semantic preservation. SketchHealer-2.0 requires the involvement of a pre-trained model to calculate the semantic similarity between the recreated sketch and the full sketch. SketchHealer-1.0 (Su et al., 2020) and SketchHealer-2.0 (Qi et al., 2022) build graphs that depend on drawing order, but this information is not always available. To overcome this difficulty, SketchLattice (Qi et al., 2021) proposes a novel lattice representation and takes image as input. However, during the data processing phase, the lattice approach causes some of the information in the raster sketch image to be lost, thus limiting SketchLattice's performance. Different from the state-of-the-art graph-structure models, which pass information between nodes to fill in the gaps, we expect that the network to take full advantage of the information in the raster sketch images and learn stable sketch representations in the absence of temporal information. Stable representations mean that the model extracts the features of the corrupted sketch as consistent as possible to the ones of the full sketch. Theoretically, this consistency allows different corrupted sketches obtained by masking from a full sketch to be recreated similarly. Conversely, when the extracted features are unstable (lack of consistency), the healed sketch fails to maintain semantics, and worse, its category changes. To learn stable sketch representations, we expect the feature maps from different layers in the network to be fully fused, which helps to extract significant and stable features from the corrupted sketches. Least mean square error reconstruction (Lmser) (Xu, 1991; 1993) enables this purpose. Lmser was a development of autoencoder (AE) (Bourlard & Kamp, 1988 ) by folding and merging the symmetrical encoder and decoder together. Such folding is equivalent to adding bi-directional skip connections between the encoder and the decoder (Xu, 2019) . The effectiveness of Lmser is demonstrated in image inpainting (Huang et al., 2020b ), super-resolution (Li et al., 2019) , and semantic segmentation (Guo et al., 2019; Cao et al., 2021) . However, these studies focus on imagerelated applications with rich texture information, rather than sparse and abstract sketches. We present Lmser-pix2seq to learn stable sketch representations against the missing information by employing a Lmser block, which falls into encoder-decoder paradigm. Taking as input a corrupted sketch, the Lmser encoder computes the embeddings of structural patterns of the input, while the decoder reconstructs the complete sketch from the embeddings. We build bi-directional skip connections between the encoder and the decoder in our Lmser block. The feedback connections enable recurrent paths to receive more information about the reconstructed sketch produced by the decoder, which helps the encoder extract stable sketch features. The features captured by the Lmser block are eventually fed into a Recurrent Neural Network (RNN) decoder to recreate the sketches. In summary, our contribution is that we propose Lmser-pix2seq to learn stable sketch representations for sketch healing. The bi-directional skip connections in our Lmser blocks allow the feature maps from the encoder and decoder to be sufficiently fused to facilitate the extraction of sketch features. Experimental results show that our Lmser-pix2seq outperforms the state-of-the-art methods, especially when the sketches are heavily masked or corrupted.

2. RELATED WORK

Sketch Generation. Research related to sketch generation with deep learning methods (Ha & Eck, 2018; Zhou et al., 2018a; Das et al., 2021; Ge et al., 2021) has been developing rapidly in recent years. An interesting work on sketch generation is that the neural network imitates humans to draw the vector sketch stroke by stroke. sketch-rnn (Ha & Eck, 2018 ) is an RNN-RNN architecture generation model based on the Variational Autoencoder (VAE) (Kingma & Welling, 2013) , which enables the conditional and the unconditional single category sketch generation. Later, the proposed sketch-pix2seq (Chen et al., 2017) with the convolutional neural network (CNN) encoder solves the multi-category generation problem and finds that latent code with the normal distribution constraint removed has better reconstruction results. Inspired by the above two models, (Song et al., 2018) fuses photo texture information with temporal information by shortcut cycle consistency. To further improve the controllability of the generation, RPCL-pix2seq (Zang et al., 2021) assumes that the latent space follows a Gaussian mixture model (GMM), and the number of Gaussians is determined by automatic selection of the model. sketch-rnn and RPCL-pix2seq are more similar to our model. There are some large pre-trained models, e.g. Sketch-Bert (Lin et al., 2020) and Sketchformer (Ribeiro et al., 2020) , for not only the sketch generation, but also for other downstream tasks. Sketch healing is similar to the combination of vector sketch generation and image inpainting. Sketch Healing. SketchHealer-1.0 (Su et al., 2020) clarified the definition of the sketch healing and proposed a novel graph representation method. SketchHealer-2.0 (Qi et al., 2022) rasterizes the generated sequence and calculates its semantic perceptual loss from the corresponding full sketch. The other model that represents sketch as a graph is the SketchLattice (Qi et al., 2021) . SketchLattice is a lightweight network that can construct graphs without relying on the drawing order. SketchLattice lattices the image, then treats the intersection of the lattice with the pixels of the sketch stroke as the nodes, and constructs the edges between the nodes by Euclidean distance. In contrast to the sketch generation task, the sketch healing task requires the model to extract accurate and effective features when the sketch is masked. Skip Connection. Forward connections refer to the direct transmission of information from the shallow layers to the deep layers via a short-circuit path. Studies have shown that forward connections can alleviate the gradient vanishing problem (He et al., 2016) and promote multi-scale feature fusion (Ronneberger et al., 2015) . This technique are widely applied in the field of computer vision,

