NAME YOUR COLOUR FOR THE TASK: ARTIFICIALLY DISCOVER COLOUR NAMING VIA COLOUR QUANTI-SATION TRANSFORMER

Abstract

The long-standing theory that a colour-naming system evolves under the dual pressure of efficient communication and perceptual mechanism is supported by more and more linguistic studies including the analysis of four decades' diachronic data from the Nafaanra language. This inspires us to explore whether artificial intelligence could evolve and discover a similar colour-naming system via optimising the communication efficiency represented by high-level recognition performance. Here, we propose a novel colour quantisation transformer, CQFormer, that quantises colour space while maintaining the accuracy of machine recognition on the quantised images. Given an RGB image, Annotation Branch maps it into an index map before generating the quantised image with a colour palette, meanwhile the Palette Branch utilises a key-point detection way to find proper colours in palette among whole colour space. By interacting with colour annotation, CQFormer is able to balance both the machine vision accuracy and colour perceptual structure such as distinct and stable colour distribution for discovered colour system. Very interestingly, we even observe the consistent evolution pattern between our artificial colour system and basic colour terms across human languages. Besides, our colour quantisation method also offers an efficient quantisation method that effectively compresses the image storage while maintaining a high performance in high-level recognition tasks such as classification and detection. Extensive experiments demonstrate the superior performance of our method with extremely low bit-rate colours. We will release the source code upon acceptance.

1. INTRODUCTION

Hath not a Jew eyes? Hath not a Jew hands, organs,dimensions, senses, affections, passions? William Shakespeare "The Merchant of Venice" Does artificial intelligence share the same perceptual mechanism for colours as human beings? We aim to explore this intriguing problem through AI simulation in this paper. Colour involves the visual reception and neural registering of light stimulants when the spectrum of light interacts with cone cells in the eyes. Physical specifications of colour also include the reflective properties of the physical objects, geometry incident illumination, etc. By defining a colour space (Forsyth & Ponce, 2002) , people could identify colours directly according to these quantifiable coordinates. Compared to the pure physiological nature of hue categorisation, the complex phenomenon of colour naming or colour categorisation spans multiple disciplines from cognitive science to anthropology. Solid diachronic research (Berlin & Kay, 1969 ) also suggests that human languages are constantly evolving to acquire new colour names, resulting in an increasingly fine-grained colour naming system. This evolutionary process is hypothesised to be under the pressure of both communication efficiency and perceptual structure. Communication efficiency requires shared colour partitioning to be communicated accurately with a lexicon as simple and economical as possible. Colour perceptual structure is relevant to human perception in nature. For example, the colour space distance The evolved fourth colour, yellow-green, is consistent with the prediction of basic colour term theory (Berlin & Kay, 1969) between nearby colours should correspond to their perceptual dissimilarity. This structure of perceptual colour space has long been used to explain colour naming patterns across languages. A recent analysis of human colour naming systems, especially in Nafaara, contributes the very direct evidence to support this hypothesis through the employment of Shannon's communication model (Shannon, 1948) . Very interestingly, this echos the research on colour quantisation, which quantises colour space to reduce the number of distinct colours in an image. Traditional colour quantisation methods (Heckbert, 1982; Gervautz & Purgathofer, 1988; Floyd & Steinberg, 1976 ) are perception-centred and generate a new image that is as visual perceptually similar as possible to the original image. These methods group similar colours in the colour space and represent each group with a new colour, thus naturally preserving the perceptual structure. Instead of prioritising the perceptual quality, Hou et al. (Hou et al., 2020) proposed a task-centred/machinecentred colour quantisation method, ColorCNN, focusing on maintaining network classification accuracy in the restricted colour spaces. While achieving an impressive classification accuracy on even a few-bit image, ColorCNN only identifies and preserves machine-centred structure, without directly clustering similar colours in the colour space. Therefore, this pure machine-centred strategy sacrifices perceptual structure and often associates nearby colours with different quantised indices. Zaslavsky et al. (Zaslavsky et al., 2022) measure the communication efficiency in colour naming by analysing the informational complexity based on the information bottleneck (IB) principle. Here, we argue that the network recognition accuracy also reflects the communication efficiency when the number of colours is restricted. Since the human colour naming is shaped by both perception structure and communication efficiency (Zaslavsky et al., 2019a) , we integrate the need for both perception and machine to propose a novel end-to-end colour quantisation transformer, CQFormer, to discover the artificial colour naming systems. As illustrated in Fig. 1 .(b), the recognition accuracy increases with the number of colours in our discovered colour naming system. Surprisingly, the complexity-accuracy trade-offs are similar to the numerical results (Fig. 1 .(a)) independently derived from linguistic research (Zaslavsky et al., 2022) . What is more, after embedding 1978 Nafaanra three-colour system (Nafaanra-1978, Fig. 1.(d) ) into the latent representation of CQFormer, our method automatically evolves the fourth colour closed to yellow-green, matching the basic colour terms theory (Berlin & Kay, 1969) summarised in different languages. Berlin and Kay found universal restrictions on colour naming across cultures and claimed languages acquire new basic colour category in a strict chronological sequence. For example, if a culture has three colours (light ('fiNge'), dark ('wOO'), and warm or red-like ('nyiE') in Nafaanra), the fourth colour it evolves should be yellow or green, exactly the one (Fig. 1 .(e)) discovered by our CQFormer. The pipeline of CQFormer, shown in Fig. 2 , comprises two branches: Annotation Branch and Palette Branch. Annotation Branch annotates each pixel of the input RGB image with the proper quantised colour index before painting the index map with the corresponding colour in the colour palette. We localise the colour palette in the whole colour space with a novel Palette Branch which detects the



Figure 1: (a) the theoretical limit of efficiency for colour naming (black curve) and cases of the WCS probability map of human colour language copied from Zaslavsky et al. (2022). (b) the colour size (from 1-bit to 6-bit)-accuracy curve on the tiny-imagenet-200 (Le & Yang, 2015) dataset. The WCS probability maps generated by our CQFormer are also shown along the curve. (c) the colour naming stimulus grid used in the WCS (Kay et al., 2009). (d) the three-term WCS probability map of CQFormer after embedding 1978 Nafaanra three-colour system ((light ('fiNge'), dark ('wOO'), and warm or red-like ('nyiE')) into the latent representation. (e) the four-term WCS probability map of CQFormer evolved from (d).The evolved fourth colour, yellow-green, is consistent with the prediction of basic colour term theory(Berlin & Kay, 1969)

