GENERALIZABLE PERSON RE-IDENTIFICATION WITH-OUT DEMOGRAPHICS

Abstract

Domain generalizable person re-identification (DG-ReID) aims to learn a readyto-use domain-agnostic model directly for cross-dataset/domain evaluation, while current methods mainly explore the demographic information such as domain and/or camera labels for domain-invariant representation learning. However, the above-mentioned demographic information is not always accessible in practice due to privacy and security issues. In this paper, we consider the problem of person re-identification in a more general setting, i.e., domain generalizable person re-identification without demographics (DGWD-ReID). To address the underlying uncertainty of domain distribution, we introduce distributionally robust optimization (DRO) to learn robust person re-identification models that perform well on all possible data distributions within the uncertainty set without demographics. However, directly applying the popular Kullback-Leibler divergence constrained DRO (or KL-DRO) fails to generalize well under the distribution shifts in real-world scenarios, since the convex condition may not hold for overparameterized neural networks. Inspired by this, we analyze and reformulate the popular KL-DRO by applying the change-of-measure technique, and then propose a simple yet efficient approach, Unit-DRO, which minimizes the loss over a new dataset with hard samples up-weighted and other samples down-weighted. We perform extensive experiments on both domain generalizable and cross-domain person ReID tasks, and the empirical results show that Unit-DRO achieves superior performance compared to all baselines without using demographics.

1. INTRODUCTION

Person re-identification (ReID) aims to find the correspondences between person images from the same identity across multiple camera views. As illustrated in Figure 1 2020)) as the extra supervision for model training. Such demographics implicitly define the variations in training data that the learned model should be invariant or robust to. Unfortunately, the demographic information is usually not available in practice due to the following reasons: 1) the collection of demographics inevitably leads to privacy problems Veale & Binns (2017), e.g., the risks of exposing the geographical location and/or the environment information; 2) the collection/annotation of domain labels is very expensive and ethically fraught endeavours Michel et al. (2021) ; and 3) such coarse-grained labels and the noise of manual annotation collected domain labels may exacerbate the hidden stratification issue, which hinders a variety of safety-critical applications Creager et al. (2021) ; Kim & Lee (2021) (see Appendix A for more discussions). Therefore, as shown in Figure 1d , we consider a more general



, previous studies mainly follow three different settings: 1) supervised person ReID Zhang et al. (2020), where training and test data are independently and identically (i.i.d) drawn from the same distribution. Though recent supervised methods have achieved remarkable performance, they are usually non-robust in out-of-distribution (OOD) settings; 2) unsupervised domain adaptative person ReID (UDA-ReID) and cross-domain person ReID (CD-ReID) Luo et al. (2020), where UDA-ReID relies on large amounts of unlabeled data for retraining and CD-ReID cannot exploit the benefits brought by multisource domains; 3) domain generalizable person ReID (DG-ReID) Dai et al. (2021a), where the model is trained on multiple large-scale datasets and tested on unseen domains directly without extra data collection/annotation and model updating on new domains. Therefore, DG-ReID is receiving increasing attention due to its great value in real-world person retrieval applications. However, current DG-ReID research usually comes at a serious disadvantage: it requires the demographic information (e.g., domain labels Choi et al. (2021); Zhao et al. (2021), camera IDs Zhang et al. (2021b); Dai et al. (2021a), and video timestamps Yuan et al. (

