REVOCABLE DEEP REINFORCEMENT LEARNING WITH AFFINITY REGULARIZATION FOR OUTLIER-ROBUST GRAPH MATCHING

Abstract

Graph matching (GM) has been a building block in various areas including computer vision and pattern recognition. Despite recent impressive progress, existing deep GM methods often have obvious difficulty in handling outliers, which are ubiquitous in practice. We propose a deep reinforcement learning based approach RGM, whose sequential node matching scheme naturally fits the strategy for selective inlier matching against outliers. A revocable action framework is devised to improve the agent's flexibility against the complex constrained GM. Moreover, we propose a quadratic approximation technique to regularize the affinity score, in the presence of outliers. As such, the agent can finish inlier matching timely when the affinity score stops growing, for which otherwise an additional parameter i.e. the number of inliers is needed to avoid matching outliers. In this paper, we focus on learning the back-end solver under the most general form of GM: the Lawler's QAP, whose input is the affinity matrix. Especially, our approach can also boost existing GM methods that use such input. Experiments on multiple real-world datasets demonstrate its performance regarding both accuracy and robustness.

1. INTRODUCTION

Graph matching (GM) aims to find node correspondence between two or multiple graphs. As a standing and fundamental problem, GM spans wide applications in different areas including computer vision and pattern recognition. With the increasing computing resource, graph matching that involves the second-order edge affinity (in contrast to the linear assignment problem e.g. bipartite matching) becomes a powerful and relatively affordable tool for solving the correspondence problem with moderate size, and there is growing research in this area, especially with the introduction of deep learning in recent years (Zanfir et al., 2018; Wang et al., 2019b) . GM can be formulated as a combinatorial optimization problem namely Lawler's Quadratic Assignment Problem (Lawler's QAP) (Lawler, 1963) , which is known as NP-hard. Generally speaking, handling the graph matching problem involves two steps: extracting features from input images to formulate a QAP instance and solving that QAP instance via constrained optimization, namely front-end feature extractor and back-end solver, respectively. Impressive progress has been made for graph matching with the introduction of rich deep learning techniques. However, in existing deep GM works, the deep learning modules are mainly applied on the front-end, especially for visual images using CNN for node feature learning (Zanfir et al., 2018) and GNN for structure embedding (Li et al., 2019) . Compared with learning-free methods, learnable features have shown more effectiveness. Another advantage of using neural networks is that the graph structure information can be readily embedded into unary node features, as such the classic NP-hard QAP in fact can degenerate into the linear assignment problem, which can be readily solved by existing back-end solvers in polynomial time. Perhaps for this reason, existing deep GM methods (Wang et al., 2019b; Fey et al., 2020a; Zhao et al., 2021; Gao et al., 2021) mostly focus on the front-end learning, basically by supervised learning using manually labeled node correspondence as ground truth. While the back-end solver is relatively little considered for learning in literature: the authors simply combine their front-end feature extractors with some traditional combinatorial solvers e.g. Sinkhorn (Cuturi, 2013) , which means they hardly utilize deep learning to improve the back-end solvers. Note that outliers in this paper refer to the common setting in literature (Yang et al., 2015) : namely the spurious nodes that cannot find their correspondence in the opposite graph for matching, which are also ubiquitous in real-world matching scenarios. While inliers are those having correspondences. More specifically, we assume the most general and challenging case that outliers could exist in both two input graphs, which is in contrast to the majority of works (Zanfir et al., 2018; Wang et al., 2021a ) that assume there is at most one graph containing outliers. Though there is a line of works for image matching effectively dismissing the outliers (Fischler & Bolles, 1981; Ma et al., 2020) , these works are basically based on specific pose and motion models in vision, which may not always be available for general GM tasks. Moreover, as mentioned above, existing deep GM works are all supervised (or based on supervised learning modules), while in the real world, the labeling is costly and even almost impossible to obtain for large-scale QAP instances in practice. Towards practical and robust graph matching learning, in the absence of labels and in the presence of outliers (in both input graphs), we propose a reinforcement learning (RL) method for graph matching namely RGM, especially for its most general QAP formulation. In particular, RL is conceptually well suited for its label-free nature and flexibility in finding the node correspondence by sequential decision making, which provides a direct way of avoiding outlier over-matching by an early stopping. In contrast, in existing deep GM works, matching is performed in one shot which incurs coupling of the inliers and outliers, and it lacks an explicit way to distinguish outliers. Therefore, we specifically devise a so-called revocable deep reinforcement learning framework to allow small mistakes over the matching procedure, and the current action is revocable to research a better node correspondence based on up-to-date environment information. Our revocable framework is shown cost-effective and empirically outperforms the existing popular techniques for refining the local decision making e.g. Local Rewrite (Chen & Tian, 2019). Moreover, since the standard GM objective refers to maximizing the affinity score between matched nodes, it causes the over-matching issue i.e. the outliers are also incorporated for matching to increase the overall score. To address this issue, we propose to regularize the affinity score such that it discourages unwanted matchings by assigning a negative score to those pairs. Intuitively, the RL agent will naturally stop matching spurious outliers as the objective score will otherwise decrease. With the help of the revocable framework and affinity regularization, our RGM shows promising performances in various experiments. For more clearance, we compare our RGM with most of the existing GM methods in Table 5 . Due to the space limit, we place it in the appendix (A.1), where we add a more detailed discussion of the comparison of RGM and existing works. To sum up, the highlights and contributions of our work are: 1) We propose RGM that sequentially selects the node correspondences from two graphs, in contrast to the majority of existing works that obtain the whole matching in one shot. Accordingly, our approach can naturally handle the case for partial matching (due to outliers) by early stopping. 2) Specifically, we first devise a revocable approach to select the possible node correspondence, whose mechanism is adapted to the unlabeled graph data with the affinity score as the reward. To the best of our knowledge, this is the first attempt to successfully adapt RL to graph matching. 3) For avoiding matching the outliers, we develop a regularization to the affinity score, making the solver no longer pursue to match as many nodes as possible. To our best knowledge, this is also the first work for regularizing the affinity score to avoid over-matching among outliers. 4) On synthetic datasets, Willow Object dataset, Pascal VOC dataset, and QAPLIB datasets, RGM shows competitive performance compared with both learning-free and learning-based baselines. Note that RGM focuses on learning the back-end solver and hence it is orthogonal to many existing front-end feature learning based GM methods, which can further boost the front-end learning solvers' performance as shown in our experiments.

funding

* Correspondence author is Junchi Yan. The work was in part supported by National Key Research and Development Program of China (2020AAA0107600), National Natural Science Foundation of China (62222607), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), and Huawei Technologies.

