LEARNING REPRESENTATION IN COLOUR CONVER-SION

Abstract

Colours can be represented in an infinite set of spaces highlighting distinct features. Here, we investigated the impact of colour spaces on the encoding capacity of a visual system that is subject to information compression, specifically variational autoencoders (VAEs) where bottlenecks are imposed. To this end, we propose a novel unsupervised task: colour space conversion (ColourConvNets). We trained several instances of VAEs whose input and output are in different colour spaces, e.g. from RGB to CIE L*a*b* (in total five colour spaces were examined). This allowed us to systematically study the influence of input-output colour spaces on the encoding efficiency and learnt representation. Our evaluations demonstrate that ColourConvNets with decorrelated output colour spaces produce higher quality images, also evident in pixel-wise low-level metrics such as colour difference (∆E), peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). We also assessed the ColourConvNets' capacity to reconstruct the global content in two downstream tasks: image classification (ImageNet) and scene segmentation (COCO). Our results show 5-10% performance boost for decorrelating ColourConvNets with respect to the baseline network (whose input and output are RGB). Furthermore, we thoroughly analysed the finite embedding space of Vector Quantised VAEs with three different methods (single feature, hue shift and linear transformation). The interpretations reached with these techniques are in agreement suggesting that (i) luminance and chromatic information are encoded in separate embedding vectors, and (ii) the structure of the network's embedding space is determined by the output colour space.

1. INTRODUCTION

Colour is an inseparable component of our conscious visual perception and its objective utility spans over a large set of tasks such as object recognition and scene segmentation (Chirimuuta et al., 2015; Gegenfurtner & Rieger, 2000; Wichmann et al., 2002) . Consequently, colour is an ubiquitous feature in many applications: colour transfer (Reinhard et al., 2001) , colour constancy (Chakrabarti, 2015), style transfer (Luan et al., 2017 ), computer graphics (Bratkova et al., 2009) , image denoising (Dabov et al., 2007) , quality assessment (Preiss et al., 2014) , to name a few. Progress in these lines requires a better understanding of colour representation and its neural encoding in deep networks. To this end, we present a novel unsupervised task: colour conversion. In our proposed framework the input-output colour space is imposed on deep autoencoders (referred to as ColourConvNets) that learn to efficiently compress the visual information (Kramer, 1991) while transforming the input to output. Essentially, the output y for input image x is generated on the fly by a transformation y = T (x), where T maps input to output colour space. This task offers a fair comparison of different colour spaces within a system that learns to minimise a loss function in the context of information bottleneck principle (Tishby & Zaslavsky, 2015) . The quality of output images demonstrates whether the representation of input-output colour spaces impacts networks' encoding power. Furthermore, the structure of internal representation provides insights on how colour transformation is performed within a neural network. In this work, we focused on Vector Quantised Variational Autoencoder (VQ-VAE) (van den Oord et al., 2017) due to the discrete nature of its latent space that facilitates the analysis and interpretability of the learnt features. We thoroughly studied five commonly used colour spaces by training

