FACE RECONSTRUCTION FROM FACIAL TEMPLATES BY LEARNING LATENT SPACE OF A GENERATOR NETWORK Anonymous

Abstract

Face recognition systems are increasingly deployed in different applications. In these systems, a feature vector (also called facial embeddings or templates) is typically extracted from each face image and is stored in the system's database during the enrollment stage, which is later used for comparison during the recognition stage. In this paper, we focus on the template inversion attack against face recognition systems and propose a new method to reconstruct face images from facial templates. Within a generative adversarial network (GAN)-based framework, we learn a mapping from facial templates to the intermediate latent space of a pre-trained face generation network, from which we can generate high-resolution realistic reconstructed face images. We show that our proposed method can be applied in whitebox and blackbox attacks against face recognition systems. Furthermore, we evaluate the transferability of our attack when the adversary uses the reconstructed face image to impersonate the underlying subject in an attack against another face recognition system. Considering the adversary's knowledge and the target face recognition system, we define five different attacks and evaluate the vulnerability of state-of-the-art face recognition systems. Our experiments show that our proposed method achieves high success attack rates in whitebox and blackbox scenarios. Furthermore, the reconstructed face images are transferable and can be used to enter target face recognition systems with a different feature extractor model.

1. INTRODUCTION

Face recognition (FR) systems tend toward ubiquity, and their applications, which range from cell phone unlock to national identity system, border control, etc., are growing rapidly. Typically, in such systems, a feature vector (called embedding or template) is extracted from each face image using a deep neural network, and is stored in the system's database during the enrollment stage. During the recognition stage, either verification or identification, the extracted feature vector is compared with the ones in the system's database to measure the similarity of identities. Among potential attacks against FR systems (Galbally et al., 2014; Marcel et al., 2014; Biggio et al., 2015; Hadid et al., 2015; Mai et al., 2018) , the template inversion (TI) attack significantly jeopardizes the users' privacy. In a TI attack, the adversary gains access to templates stored in the FR system's database and aims to reconstruct the underlying face image. Then, the adversary not only achieves privacy-sensitive information (such as gender, ethnicity, etc.) of enrolled users, but also can use reconstructed face images to impersonate. In this paper, we focus on the TI attack against FR systems and propose a novel method to reconstruct face images from facial templates (Fig. 1 shows sample reconstructed face images using our proposed method). Within a generative adversarial network (GAN)-based framework, we learn a mapping from face templates to the intermediate latent space of StyleGAN3 (Karras et al., 2021) , as a pre-trained face generation network. Then, using the synthesis part of StyleGAN3, we can generate high-resolution realistic face image. Our proposed method can be applied for whitebox and blackbox attacks against FR systems. In the whitebox scenario, the adversary knows the internal functioning of the feature extraction model and its parameters. However, in the blackbox scenario, the adversary does not know the internal functioning of the feature extraction model and can only use it to extract features from any arbitrary image. Instead, we assume that the adversary has a whitebox of another FR model, which can be used for training the face reconstruction network. We also evaluate the transferability of our attack by considering the case where the adversary uses the reconstructed face image to impersonate the underlying subject in an attack against another FR system (which has a different feature extraction model). Considering the adversary's knowledge and the target FR system, we define five different attacks, and evaluate the vulnerability of state-of-the-art (SOTA) FR systems. Fig. 2 illustrates the general black diagram of our proposed template inversion attack. To elaborate on the contributions of our paper, we list them hereunder: • We propose a novel method to generate high-resolution realistic face images from facial templates. Within a GAN-based framework, we learn the mapping from facial templates to the latent space of a pre-trained face generation network. • We propose our method for whitebox and blackbox scenarios. While our method is based on the whitebox knowledge of the FR model, we extend our attack blackbox scenario, using another FR model that the adversary has access to. • We define five different attacks against FR systems (based on the adversary's knowledge and the target system), and evaluate the vulnerability of SOTA FR models. The remainder of the paper is organized as follows: Section 2 introduces the problem formulation and our proposed face reconstruction method. Section 3 covers the related works in the literature and compares them with our proposed method. Section 4 presents our experiential results. Finally, the paper is concluded in Section 5.

2. PROBLEM DEFINITION AND PROPOSED METHOD

In this paper, we consider a TI attack against a FR system based on the following threat model: • Adversary's goal: The adversary aims to reconstruct a face image from a template, and use the reconstructed face image to enter the same or a different face recognition system, which we call the target FR system. • Adversary's knowledge: The adversary knows a face template of a user enrolled in the FR system's database. The adversary also has either whitebox or blackbox knowledge of the feature extractor model in the same FR system. • Adversary's capability: The adversary can present the reconstructed face image to the target FR system (e.g., using a printed photograph). However, for simplicity, we consider that adversary can inject the reconstructed face image as a query to the target FR system. • Adversary's strategy: The adversary can train a face reconstruction model to invert facial templates and reconstruct underlying face images. Then, the adversary can use the reconstructed face images to inject as a query to the target FR system, to enter that system. Let F (.) denotes a facial feature extraction model, which gets the face image I ∈ I and extracts facial template x = F (I) ∈ X . According to the threat model, the adversary has access to the target facial template x database = F database (I) and aims to generate a reconstructed face image Î. Then, the adversary can use the reconstructed face image Î to impersonate the corresponding subject and attack a target FR system with F target (.), which might be different from F database (.). To train a face reconstruction model, we can use a dataset of face images {I i } N i=1 with N face images (no label is required), and generate a training dataset {(x i , I i )} N i=1 , where x i = F database (I i ). Then, a face reconstruction model G(.) can be trained to reconstruct face image Î = G(x) given each facial template x ∈ X . To train such a face reconstruction model, we consider a multi-term face reconstruction loss function as follows: L rec = L pixel + L ID , where L pixel and L ID indicate pixel loss and ID loss, respectively, and are defined as: L pixel = E x∼X [ I -G(x) 2 2 ], L ID = E x∼X [ F loss (I) -F loss (G(x)) 2 2 ]. The pixel loss is used to minimize the pixel-level reconstruction error of the generated face image. The ID loss is also used to minimize the distance between facial templates extracted by F loss (.) from original and reconstructed face images. In Eq. 3, F loss (.) denotes a feature extraction model that the adversary is assumed to have complete knowledge of its parameters and internal functioning. Based on the adversary's knowledge of F database (.) (i.e., whitebox or blackbox scenarios), F loss (.) might be the same or different from F database (.). to generate the reconstructed face image Î = S StyleGAN ( ŵ). We can train our new mapping M rec (.) using our reconstruction loss function as in Eq. 1. However, to obtain a realistic face image from the generated ŵ through the pre-trained synthetic network S StyleGAN (.), the generated ŵ needs to be in the distribution W; otherwise, the output may not look like a real human face. Hence, to generate ŵ vectors such that they have the same distribution as StyhleGAN's intermediate latent, w ∈ W, we use a GAN-based framework to learn the distribution W. To this end, we use the Wasserstein GAN (WGAN) Arjovsky et al. (2017) algorithm to train a critic network C(.) which critics the generated ŵ vectors compared to the real StyleGAN's w ∈ W vectors, and simultaneously we optimize our mapping network to generate ŵ vectors with the same distribution as W. Hence, we can consider our mapping network M rec (.) as a conditional generator in our WGAN framework, which generates ŵ = M rec ([n, x] ) given a facial template x ∈ X and a random noise vector n ∈ N . Then, we can train our mapping network and critic network using the following loss functions: L WGAN C = E w∼W [C(w)] -E ŵ∼Mrec([n,x]) [C( ŵ)] (4) L WGAN Mrec = E ŵ∼Mrec([n,x]) [C( ŵ)] In a nutshell, we train a new mapping network M rec (.) using our reconstruction loss function in Eq. 1, and also optimize M rec (.) within our WGAN framework using Eq. 5. Simultaneously, we also train the critic network C(.) within our WGAN using Eq. 4 to learn the distribution of StyleGAN's intermediate latent space W and help our mapping network M rec (.) to generate vectors with the same distribution as W. Fig. 3 depicts the block diagram of the proposed method. We should note that our mapping network M rec (.) has 2 fully connected layers with Leaky ReLU activation function. In our problem formulation, we consider three different feature extraction models, including F database (.), F loss (.), and F target (.). Hence, based on the adversary's knowledge and the target system, we can consider five different attacks: • Attack 1: The adversary has whitebox knowledge of the system from which the template is leaked and want to attack the same system (i.e., F database = F loss = F target ). • Attack 2: The adversary has whitebox knowledge of the feature extractor of the system from which the template is leaked, but aims to attack to a different FR system (i.e., F database = F loss = F target ). • Attack 3: The adversary wants to attack the same system from which the template is leaked, but has only blackbox access to the feature extractor of the system. Instead, we assume that the adversary has the whitebox knowledge of another FR model to use for training (i.e., F database = F target = F loss ). • Attack 4: The adversary aims to attack a different FR system than the one from which the template is leaked. In addition, the adversary has whitebox knowledge of the feature extractor of the target system (i.e., F database = F loss = F target ). • Attack 5: The adversary aims to attack a different FR system from which the template is leaked and has only blackbox knowledge of both the target system and the one from which the template is leaked. However, the adversary instead has the whitebox knowledge of another FR model to use for training (i.e., F database = F loss = F target ). In the attack 1 and attack 2, the adversary has the whitebox knowledge of the system from which the template is leaked (i.e., F database (.)) and uses the same model as F loss (.) for training the reconstruction network. However, in attacks 3-5, the adversary has the blackbox knowledge of the system from which the template is leaked, and therefore uses another FR model as F loss (.). Comparing the knowledge of the adversary in these attacks, we expect that attack 1 be the easiest attack for the adversary and attack 5 be the most difficult one.

3. RELATED WORKS

Table 1 compares our proposed method with related works in the literature. Generally, the methods for TI attack against FR systems, can be categorized based on different aspects, including the resolution of generated face images (high/low resolution), the type of attack (whitebox/blackbox attack), and the basis of the method (optimization/learning-based). The method is based on the whitebox attack, and is extended to blackbox by removing a loss term that required the FR model. * * The method is based on the whitebox attack, and the blackbox attack is performed by knowledge distillation of the FR model. * * * The method is based on the whitebox attack, and is extended to blackbox using a different FR model. Zhmoginov & Sandler (2016) proposed an optimization-based method and a learning-based method to generate low-resolution face images in the whitebox attack against FR systems. In their optimization-based attack, they used a gradient-descent-based approach to find an image that minimizes the distance of the face template as well as some regularization terms to generate a smooth image, including the total variation and Laplacian pyramid gradient normalization (Burt & Adelson, 1987) of the reconstructed face image. In their learning-based attack, they trained a convolutional neural network (CNN) with the same loss terms to generate face images from given facial templates. Cole et al. (2017) proposed a learning-based attack to generate low-resolution images using a multilayer perceptron (MLP) to estimate landmark coordinates and a CNN to generate face textures, and then reconstructed face images using a differentiable warping based on estimated landmarks and face texture. They trained their networks in an end-to-end fashion, and minimized the errors for landmark estimation and texture generation as well as the distance of face template as their loss function. To extend their method from the whitebox attack to the blackbox attack, they proposed not to minimize the distance of face templates in their loss function. Mai et al. (2018) proposed a learning-based attack to generate low-resolution images in the blackbox attack against FR systems. They proposed new convolutional blocks, called neighborly deconvolution blocks A/B (shortly, NbBlock-A and NbBlock-B), and used these blocks to reconstruct face images. They trained their proposed networks using two loss functions, including pixel loss (i.e., 2 norm of reconstruction pixel error) and perceptual loss (i.e., 2 norm of distance for intermediate features of VGG-19 (Simonyan & Zisserman, 2014) given original and reconstructed face images). Duong et al. (2020) and Truong et al. (2022) used a same bijection learning framework and trained a GAN with a generator with structure of PO-GAN (Karras et al., 2017) and TransGAN (Jiang et al., 2021) , respectively. While their method is based on the whitebox attack, they proposed to use knowledge distillation to extend to the blackbox attack. To this end, they trained a student network that mimics the target FR model. However, they did not provide any details (nor source code) about student network training, such as the structure of the student network, etc. 2021) used a pre-trained StyleGAN to generate high-resolution face images in the blackbox attack against FR systems. They generated synthetic face images using pre-trained StyleGAN and extracted their embedding. Then, they trained a fully connected network using mean squared error to map extracted embeddings to the corresponding noise in the input of StyleGAN. Instead of a learning-based approach, Vendrow & Vendrow (2021) used a grid search optimization using the simulated annealing (Van Laarhoven & Aarts, 1987) approach to find the noise in the input of StyleGAN, which generates an image that has the same embedding. As their iterative method has a large computation cost, they evaluated their method on 20 images only. Along the same lines, Dong et al. (2022) also tried to solve a similar optimization to (Vendrow & Vendrow, 2021) with a different approach. They used the genetic algorithm to find the noise in the input of StyleGAN that can generate an image with the same embedding.

Dong et al. (

Compared to most works in the literature that generate low-resolution face images, our proposed method generates high-resolution realistic face images. While low-resolution reconstructed images can be used for evaluating the vulnerability of FR systems under some assumptions, high-resolution images can lead to different types of presentation attacks against FR systems. We also propose our method for both whitebox and blackbox scenarios and evaluate the transferability of our attack. Similar to (Cole et al., 2017; Duong et al., 2020; Truong et al., 2022) , our method is based on the whitebox knowledge of FR model, however our approach for extending our method to the blackbox attack using another FR model is novel. Last but not least, we define five different attacks against FR systems and evaluate the vulnerability of SOTA FR models to our attacks.

4. EXPERIMENTS

In this section, we present our experiments and discuss our results. First, in Section 4.1 we describe our experimental setup. Then, we present our experimental results in Section 4.2 and discuss our findings.

4.1. EXPERIMENTAL SETUP

Table 2 : Recognition performance of face recognition models used in our experiments in terms of true match rate (TMR) at the thresholds correspond to false match rates (FMRs) of 10 -2 and 10 -3 evaluated on the MO-BIO and LFW datasets. The values are in percentage. To evaluate the performance of our method, we consider two SOTA FR models, including ArcFace (Deng et al., 2019) , ElasticFace (Boutros et al., 2022) , as the models from which templates are leaked (i.e., F database ). For transferability evaluation, we also use three different FR models with SOTA backbones from FaceX-Zoo (Wang et al., 2021) , including HRNet (Wang et al., 2020) , Attention-Net (Wang et al., 2017) , and Swin (Liu et al., 2021) , for the target FR system (i.e., F target ). The recognition performance of these models are reported in Table 2 . All these models are trained on MS-Celeb1M dataset (Guo et al., 2016) . We assume that the adversary does not have access to the FR training dataset, and therefore we use another dataset for training our face reconstruction models. To this end, we use the Flickr-Faces-HQ (FFHQ) dataset (Karras et al., 2019) , which consists of 70,000 high-resolution (i.e., 1024 × 1024) face images (without identity labels) crawled from the internet. We use 90% random portion of this dataset for training, and the remaining 10% for validation. model MOBIO LFW FMR=10 -2 FMR=10 -3 FMR=10 -2 FMR=10 - To evaluate different attacks against FR systems, we consider two other face image datasets with identity labels, including the MOBIO (McCool et al., 2013) and Labeled Faces in the Wild (LFW) (Huang et al., 2007) datasets. The MOBIO dataset consists of bi-modal (face and voice) data captured using mobile devices from 150 people in 12 sessions (6-11 samples in each session). The LFW dataset includes 13,233 face images of 5,749 people collected from the internet, where 1,680 people have two or more images. For each of the attacks described in Section 2, we build one or two separate FR systems with one or two SOTA FR models based on the attack type. If the target system is the same as the system from which the template is leaked, we have only one FR system. Otherwise, if the target system is different the system from which the template is leaked, we have two FR systems with two different feature extractors. In each case, we use one of our evaluation datasets (i.e., MOBIO and LFW) to build both FR systems (so that the subject with the leaked template be enrolled in the target system too). In each evaluation, we assume that the target FR system is configured at the threshold corresponding to a false match rate (FMR) of 10 -3 , and we evaluate the adversary's success attack rate (SAR) in entering that system. We should note that the templates extracted by the aforementioned FR models have 512 dimensions. The input noise z ∈ Z to the mapping network of StyleGAN's pre-trained network is from the standard normal distribution and has 512 dimensions. The input noise n ∈ N to our mapping network M rec (.) is with dimension of 8 and also from the standard normal distribution. We also use Adam (Kingma & Ba, 2015) optimizer to train our mapping network. Table 3 : Evaluation of attacks with whitebox knowledge of the system from which the template is leaked (i.e., F loss = F database ) against SOTA FR models in terms of adversary's success attack rate (SAR) using our proposed method on the MOBIO and LFW datasets. The values are in percentage and correspond to the threshold where the target system has FMR = 10 -3 . Cells are color coded according the type of attack as defined in Section 2 for attack 1 ( light gray ) and attack 2 ( dark gray ).  F

4.2. ANALYZE

In this section, we consider SOTA FR models and evaluate the performance of our face reconstruction method in five different attacks described in Section 2. We also explore the effect of our WGAN traning as well as effect of loss terms as our ablation study. Whitebox knowledge of F database For attacks 1-2, the adversary is assumed to have whitebox knowledge of the system from which the template is leaked (i.e., F database ) and use the same feature extraction model for training (i.e., F loss ), thus in such cases F loss = F database . We considered ArcFace and ElasticFace models and reconstructed face images from the templates extracted by these models in attacks against different FR systems. Table 3 reports the vulnerability of different target systems to our attacksfoot_0 1-2 in terms of adversary's SAR at the system's FMR of 10 -3 . Similar results for the system's FMR of 10 -2 are reported in Table 6 of Appendix. According to these tables, our method achieves considerable SAR against ArcFace and ElasticFace target systems in attack 1. In attack 2, we observe that there is a degradation in SAR with respect to attack 1. However, the reconstructed face images can still be used to enter another target system. Meanwhile, the FR model with a higher recognition accuracy is generally more vulnerable to attack 2. For instance, when ArcFace is considered as F database , we observe that ElasticFace and Swin have the highest SAR as target systems, while there is the same order for their recognition performance in Table 2 . Blackbox knowledge of F database For attacks 3-5, the adversary is assumed to have blackbox knowledge of the system from which the template is leaked (i.e., F database ) and use another feature extraction model for training (i.e., F loss ), therefore in such cases F loss = F database . Table 4 compares the performance of our method with blackbox methodsfoot_1 in the literature (Mai et al., 2018; Dong et al., 2021; Vendrow & Vendrow, 2021) for attacks 3-5 in terms of adversary's SAR at system's FMR of 10 -3 . Similar results for the system's FMR of 10 -2 are available in As these tables show, our proposed method achieves the highest SAR compared to (Mai et al., 2018; Dong et al., 2021; Vendrow & Vendrow, 2021) against FR systems on the MO-BIO and LFW datasets. In particular, in attack 5 which is the hardest attack, where F database , F loss , and F target are different, the results show that the target FR system is still vulnerable to our attack. The results of our method for attack 5 also show transferability of our attack to different FR systems. Similar to attack 2, we can also observe that in attack 5, the FR model with a higher recognition accuracy is generally more vulnerable to our attack. Fig. 4 also shows sample face images from the LFW dataset and the reconstructed images using our proposed method from ArcFace templates in different attacks. We should highlight that as show in Fig. 4 , the reconstructed face images in attack 1 and attack 2 are the same, but they are used to enter different target FR system. The same holds for the reconstructed face images in attacks 3-5. Ablation Study To evaluate the effect of WGAN in training our mapping network and the effect of each term in our loss function (i.e., Eq. 1), we consider the ArcFace model in the whitebox scenario and train different face reconstruction networks with different loss functions. Then, we attack a system with the ArcFace model as a feature extractor (i.e., attack 1) and compare the SARs as reported in Table 5 . According to these results, the proposed adversarial training has a significant effect on our face reconstruction method. In other words, the WGAN framework helps our mapping network to learn the distribution of StyleGAN's intermediate latent space to generate face-like images. When we use the WGAN training and based on the results in Table 5 , the ID loss has a high impact on the performance of the template inversion model. While the pixel loss by itself does not achieve a good performance, it improves the performance of ID loss in our reconstruction loss function in Eq. 1. This table confirms that the proposed WGAN training and our reconstruction loss function lead to a more successful attack. Limitations Despite the significant performance of our method in terms of success attack rate in all types of attacks reported in Table 3 and Table 4 , the reconstructed face images fail to enter the system in some cases. Fig. 5 illustrates sam- 

5. CONCLUSION

In this paper, we proposed a new method to reconstruct high-resolution realistic face images from facial templates in a FR system. We used a pre-trained StyleGAN3 network and learned a mapping from facial templates to intermediate latent space of StyleGAN within a GAN-based framework. We proposed our method for whitebox and blackbox scenarios. In the whitebox scenario, the adversary can use the feature extraction model for training the face reconstruction network; however, in the blackbox scenario, we assume that the adversary has access to another feature extraction model. In addition, we consider the threat model where the adversary might impersonate in the same or another (i.e., transferable attack) FR system. Based on the adversary's knowledge of the feature extraction model and the target FR system, we defined five different attacks and evaluated the vulnerability of SOTA FR systems to our proposed method. Our experiments showed that the reconstructed face images by our proposed method not only can achieve a high SAR in whitebox and blackbox scenarios, but also are transferable and can be used to enter target FR systems with a different FR model.

ETHICS STATEMENT

Motivations The proposed face reconstruction method is presented with the motivation of showing vulnerability of face recognition systems to template inversion attacks. We hope this work encourages researcher of the community to investigate the next generation of safe and robust face recognition systems and to develop new algorithms to protect existing systems. Considerations While the proposed method might pose a social threat against unprotected systems, we do not condone using our work with the intent of attacking a real face recognition system or other malicious purposes. The authors also acknowledge a potential lack of diversity in the reconstructed face images, stemming from inherent biases of datasets used in our experiments. Mitigation of such attacks This paper demonstrates an important privacy and security threat to the state-of-the-art unprotected face recognition systems. Along the same lines, data protection frameworks, such as the European Union General Data Protection Regulation (EU-GDPR) (European Council, 2016), put legal obligations to protect biometric data as sensitive information. To this end and to prevent such attacks to face recognition systems, several biometric template protection algorithms are proposed in the literature (Nandakumar & Jain, 2015; Sandhya & Prasad, 2017; Kaur et al., 2022; Kumar et al., 2020) .

A APPENDIX

Table 6 : Evaluation of attacks with whitebox knowledge of the system from which the template is leaked (i.e., F loss = F database ) against SOTA FR models in terms of adversary's success attack rate (SAR) using our proposed method on the MOBIO and LFW datasets. The values are in percentage and correspond to the threshold where the target system has FMR= 10 -2 . Cells are color coded according the type of attack as defined in Section 2 for attack 1 ( light gray ) and attack 2 ( dark gray ).  F



We should highlight that since there is no whitebox method in the literature with available source code (as mentioned in Table1), we could not compare our proposed method with other whitebox methods. The other blackbox methods in the literature do not have available source code and we could not reproduce their results. The biases for different demographies in verification task for ArcFace model are studied in (de Freitas Pereira & Marcel, 2021). Similarly, biases in StyleGAN generated images and also the FFHQ dataset (i.e., our training dataset) are investigated in(Karakas et al., 2022;Tan et al., 2020;Balakrishnan et al., 2020). Available at https://github.com/NVlabs/stylegan3 The source code will be available upon acceptance of the paper.



Figure 1: Sample face images from the FFHQ dataset and their corresponding reconstructed images using our template inversion method from ArcFace templates. The values below each image show the cosine similarity between the corresponding templates of original and reconstructed face images.

Figure 2: Block diagram of our proposed template inversion attack

Figure 3: Block diagram of our face reconstruction network.For the face reconstruction model, we consider StyleGAN3(Karras et al., 2021), as a pre-trained face generation network. The Style-GAN3 model is trained on a dataset of face images using a GAN-based framework that can generate highresolution and realistic face images. The structure of StyleGAN3 is composed of two networks, mapping and synthesis networks. The mapping network M StyleGAN (.) gets a random noise z ∈ Z and generates an intermediate latent code w = M StyleGAN (z) ∈ W. Then, the latent code w is given to the synthesis network S StyleGAN (.) to generate a face image. In our training process, we fix the synthetic network S StyleGAN (.) and train a new mapping M rec (.) to generate ŵ corresponding to the given facial template x ∈ X . Then, the generated latent code ŵ is given to the synthesis network S StyleGAN (.)

Figure 4: Sample face images from the LFW dataset (first raw) and their corresponding reconstructed images using our template inversion method from ArcFace templates in different attacks, attacks 1-2 (second raw) and attacks 3-5 (second raw, using ElasticFace for F loss ). The values below each image show the cosine similarity between the corresponding ArcFace templates of original and reconstructed face images.

Comparison with related works.

3

Evaluation of attacks (with blackbox knowledge of the system from which the template is leaked i.e., F database ) against SOTA FR models in terms of adversary's success attack rate (SAR) using different methods on the MOBIO and LFW datasets. The values are in percentage and correspond to the threshold where the target system has FMR = 10 -3 . M1: NbNetB-M(Mai et al., 2018), M2: NbNetB-P(Mai et al., 2018), M3:(Dong et al., 2021), and M4:(Vendrow & Vendrow, 2021). Cells are color coded according the type of attack as defined in Section 2 for attack 3 ( lightest gray ), attack 4 ( middle dark gray ), and attack 5 ( darkest gray ).

Table 7 of Appendix.

Evaluating the effect of each loss term in our loss function in attack 1 against ArcFace in terms of SAR in the system with FMRs of 10 -2 and 10 -3 evaluated on the MOBIO and LFW datasets. The values are in percentage. in the attack 3 against ArcFace (using ElasticFace for F loss ) on the LFW dataset. From the failure cases, we can conclude that there is a bias in the face reconstruction for specific demographies, like elderly or dark skin people. Indeed, such kind of bias in the reconstructed face images is caused by inherent biases in datasets used to train FR model, the StyleGAN model, and our mapping network in our face reconstruction model 3 .

Evaluation of attacks (with blackbox knowledge of the system from which the template is leaked i.e., F database ) against SOTA FR models in terms of adversary's success attack rate (SAR) using different methods on the MOBIO and LFW datasets. The values are in percentage and correspond to the threshold where the target system has FMR= 10 -2 . M1: NbNetB-M(Mai et al., 2018), M2: NbNetB-P(Mai et al., 2018), M3:(Dong et al., 2021), and M4:(Vendrow & Vendrow, 2021). Cells are color coded according the type of attack as defined in Section 2 for attack 3 ( lightest gray ), attack 4 ( middle dark gray ), and attack 5 ( darkest gray ).

REPRODUCIBILITY STATEMENT

In our experiments, we use PyTorch package and trained our models on a system equipped with an NVIDIA GeForce RTX TM 3090. We use the pre-trained model of StyleGAN3 4 to generate 1024 × 1024 high-resolution images. The source code of our experiments is publicly available to help reproduce our results 5 .

