FACE RECONSTRUCTION FROM FACIAL TEMPLATES BY LEARNING LATENT SPACE OF A GENERATOR NETWORK Anonymous

Abstract

Face recognition systems are increasingly deployed in different applications. In these systems, a feature vector (also called facial embeddings or templates) is typically extracted from each face image and is stored in the system's database during the enrollment stage, which is later used for comparison during the recognition stage. In this paper, we focus on the template inversion attack against face recognition systems and propose a new method to reconstruct face images from facial templates. Within a generative adversarial network (GAN)-based framework, we learn a mapping from facial templates to the intermediate latent space of a pre-trained face generation network, from which we can generate high-resolution realistic reconstructed face images. We show that our proposed method can be applied in whitebox and blackbox attacks against face recognition systems. Furthermore, we evaluate the transferability of our attack when the adversary uses the reconstructed face image to impersonate the underlying subject in an attack against another face recognition system. Considering the adversary's knowledge and the target face recognition system, we define five different attacks and evaluate the vulnerability of state-of-the-art face recognition systems. Our experiments show that our proposed method achieves high success attack rates in whitebox and blackbox scenarios. Furthermore, the reconstructed face images are transferable and can be used to enter target face recognition systems with a different feature extractor model.

1. INTRODUCTION

Face recognition (FR) systems tend toward ubiquity, and their applications, which range from cell phone unlock to national identity system, border control, etc., are growing rapidly. Typically, in such systems, a feature vector (called embedding or template) is extracted from each face image using a deep neural network, and is stored in the system's database during the enrollment stage. During the recognition stage, either verification or identification, the extracted feature vector is compared with the ones in the system's database to measure the similarity of identities. Among potential attacks against FR systems (Galbally et al., 2014; Marcel et al., 2014; Biggio et al., 2015; Hadid et al., 2015; Mai et al., 2018) , the template inversion (TI) attack significantly jeopardizes the users' privacy. In a TI attack, the adversary gains access to templates stored in the FR system's database and aims to reconstruct the underlying face image. Then, the adversary not only achieves privacy-sensitive information (such as gender, ethnicity, etc.) of enrolled users, but also can use reconstructed face images to impersonate. In this paper, we focus on the TI attack against FR systems and propose a novel method to reconstruct face images from facial templates (Fig. 1 shows sample reconstructed face images using our proposed method). Within a generative adversarial network (GAN)-based framework, we learn a mapping from face templates to the intermediate latent space of StyleGAN3 (Karras et al., 2021), as a pre-trained face generation network. Then, using the synthesis part of StyleGAN3, we can generate high-resolution realistic face image. Our proposed method can be applied for whitebox and blackbox attacks against FR systems. In the whitebox scenario, the adversary knows the internal functioning of the feature extraction model and its parameters. However, in the blackbox scenario, the adversary does not know the internal functioning of the feature extraction model and can only use it to extract features from any arbitrary image. Instead, we assume that the adversary has a whitebox of another FR model, which can be used for training the face reconstruction network. We also evaluate the transferability of our attack by considering the case where the adversary uses the reconstructed face image to impersonate the underlying subject in an attack against another FR system (which has a different feature extraction model). Considering the adversary's knowledge and the target FR system, we define five different attacks, and evaluate the vulnerability of state-of-the-art (SOTA) FR systems. Fig. 2 illustrates the general black diagram of our proposed template inversion attack. To elaborate on the contributions of our paper, we list them hereunder: • We propose a novel method to generate high-resolution realistic face images from facial templates. Within a GAN-based framework, we learn the mapping from facial templates to the latent space of a pre-trained face generation network. • We propose our method for whitebox and blackbox scenarios. While our method is based on the whitebox knowledge of the FR model, we extend our attack blackbox scenario, using another FR model that the adversary has access to. • We define five different attacks against FR systems (based on the adversary's knowledge and the target system), and evaluate the vulnerability of SOTA FR models. The remainder of the paper is organized as follows: Section 2 introduces the problem formulation and our proposed face reconstruction method. Section 3 covers the related works in the literature and compares them with our proposed method. Section 4 presents our experiential results. Finally, the paper is concluded in Section 5.

2. PROBLEM DEFINITION AND PROPOSED METHOD

In this paper, we consider a TI attack against a FR system based on the following threat model:



Figure 1: Sample face images from the FFHQ dataset and their corresponding reconstructed images using our template inversion method from ArcFace templates. The values below each image show the cosine similarity between the corresponding templates of original and reconstructed face images.

Figure 2: Block diagram of our proposed template inversion attack

