ACTIVE IMAGE INDEXING

Abstract

Image copy detection and retrieval from large databases leverage two components. First, a neural network maps an image to a vector representation, that is relatively robust to various transformations of the image. Second, an efficient but approximate similarity search algorithm trades scalability (size and speed) against quality of the search, thereby introducing a source of error. This paper improves the robustness of image copy detection with active indexing, that optimizes the interplay of these two components. We reduce the quantization loss of a given image representation by making imperceptible changes to the image before its release. The loss is back-propagated through the deep neural network back to the image, under perceptual constraints. These modifications make the image more retrievable. Our experiments show that the retrieval and copy detection of activated images is significantly improved. For instance, activation improves by +40% the Recall1@1 on various image transformations, and for several popular indexing structures based on product quantization and locality sensitivity hashing.

1. INTRODUCTION

The traceability of images on a media sharing platform is a challenge: they are widely used, easily edited and disseminated both inside and outside the platform. In this paper, we tackle the corresponding task of Image Copy Detection (ICD), i.e. finding whether an image already exists in the database; and if so, give back its identifier. ICD methods power reverse search engines, photography service providers checking copyrights, or media platforms moderating and tracking down malicious content (e.g. Microsoft's PhotoDNA (2009) or Apple's NeuralHash ( 2021)). Image identification systems have to be robust to identify images that are edited (cropping, colorimetric change, JPEG compression . . . ) after their release (Douze et al., 2021; Wang et al., 2022) . The common approach for content-based image retrieval reduces images to high-dimensional vectors, referred to as representations. Early representations used for retrieval were hand-crafted features such as color histograms (Swain & Ballard, 1991) , GIST (Oliva & Torralba, 2001) , or Fisher 



Figure1: Overview of the method and latent space representation. We start from an original image Io that can be edited t(•) in various ways: its feature extraction f (t(Io)) spawns the shaded region in the embedding space. The edited versions should be recoverable by nearest neighbor search on quantized representations. In the regular (non-active) case, f (Io) is quantized by the index as . When the image is edited, t(Io) switches cells and the closest neighbor returned by the index is the wrong one . In active indexing: Io is modified in an imperceptible way to generate I ⋆ such that f (I ⋆ ) is further away from the boundary. When edited copies f (t(I ⋆ )) are queried, retrieval errors are significantly reduced.

