DEEP REPULSIVE CLUSTERING OF ORDERED DATA BASED ON ORDER-IDENTITY DECOMPOSITION

Abstract

We propose the deep repulsive clustering (DRC) algorithm of ordered data for effective order learning. First, we develop the order-identity decomposition (ORID) network to divide the information of an object instance into an order-related feature and an identity feature. Then, we group object instances into clusters according to their identity features using a repulsive term. Moreover, we estimate the rank of a test instance, by comparing it with references within the same cluster. Experimental results on facial age estimation, aesthetic score regression, and historical color image classification show that the proposed algorithm can cluster ordered data effectively and also yield excellent rank estimation performance.

1. INTRODUCTION

There are various types of 'ordered' data. For instance, in facial age estimation (Ricanek & Tesafaye, 2006) , face photos are ranked according to the ages. Also, in a video-sharing platform, videos can be sorted according to the numbers of views or likes. In these ordered data, classes, representing ranks or preferences, form an ordered set (Schröder, 2003) . Attempts have been made to estimate the classes of objects, including multi-class classification (Pan et al., 2018) , ordinal regression (Frank & Hall, 2001) , metric regression (Fu & Huang, 2008) . Recently, a new approach, called order learning (Lim et al., 2020) , was proposed to solve this problem. Order learning is based on the idea that it is easier to predict ordering relationship between objects than to estimate the absolute classes (or ranks); telling the older one between two people is easier than estimating their exact ages. Hence, in order learning, the pairwise ordering relationship is learned from training data. Then, the rank of a test object is estimated by comparing it with reference objects with known ranks. However, some objects cannot be easily compared. It is less easy to tell the older one between people of different genders than between those of the same gender. Lim et al. (2020) tried to deal with this issue, by dividing an ordered dataset into disjoint chains. But, the chains were not clearly separated, and no meaningful properties were discovered from the chains. In this paper, we propose a reliable clustering algorithm, called deep repulsive clustering (DRC), of ordered data based on order-identity decomposition (ORID). Figure 1 shows a clustering example of ordered data. Note that some characteristics of objects, such as genders or races in age estimation, are not related to their ranks, and the ranks of objects sharing such characteristics can be compared more reliably. To discover such characteristics without any supervision, the proposed ORID network decomposes the information of an object instance into an order-related feature and an identity feature unrelated to the rank. Then, the proposed DRC clusters object instances using their identity features; in each cluster, the instances share similar identity features. Furthermore, given a test instance, we decide its cluster based on the nearest neighbor (NN) rule, and compare it with reference instances within the cluster to estimate its rank. To this end, we develop a maximum a posteriori (MAP) estimation rule. Experimental results on ordered data for facial age estimation, aesthetic score regression (Kong et al., 2016) , and historical color image classification (Palermo et al., 2012) demonstrate that the proposed algorithm separates ordered data clearly into meaningful clusters and provides excellent rank estimation performances for unseen test instances. The contributions of this paper can be summarized as follows. • We first propose the notion of identity features of ordered data and develop the ORID network for the order-identity decomposition. • We develop the DRC algorithm to cluster data on a unit sphere effectively using a repulsive term. We also prove the local optimality of the solution. • We propose the MAP decision rule for rank estimation. The proposed algorithm provides the state-of-the-art performances for facial age estimation and aesthetic score regression.

2.1. ORDER LEARNING

The notion of order learning was first proposed by Lim et al. (2020) . It aims to determine the order graph of classes and classify an object into one of the classes. In practice, it trains a pairwise comparator, which is a ternary classifier, to categorize the relationship between two objects into one of three cases: one object is bigger than, similar to, or smaller than the other. Then, it estimates the rank of a test object, by comparing it with reference objects with known ranks. However, not every pair of objects are easily comparable. Although Lim et al. (2020) attempted to group objects into clusters, in which objects could be more accurately compared, their clustering results were unreliable. Pairwise comparison has been used to estimate object ranks, because relative evaluation is easier than absolute evaluation in general. Saaty (1977) proposed the scaling method to estimate absolute priorities from relative priorities, which has been applied to various decision processes, including aesthetic score regression (Lee & Kim, 2019) . Also, some learning to rank (LTR) algorithms are based on pairwise comparison (Liu, 2009; Cohen et al., 1998; Burges et al., 2005; Tsai et al., 2007) . Order learning attempts to combine (possibly inconsistent) pairwise ordering results to determine the rank of each object. Thus, it is closely related to the Cohen et al. 's LTR algorithm (1998) , which learns a pairwise preference function and obtains a total order of a set to maximize agreements among preference judgments of pairs of elements. Also, order learning is related to rank aggregation (Dwork et al., 2001) , in which partially ordered sets are combined into a linearly ordered set to achieve the maximum consensus among those partial sets. Rank aggregation has been studied in various fields (Brüggemann et al., 2004) . Since optimal aggregation is NP-hard, Dwork et al. ( 2001) proposed an approximate algorithm, called Markov chain ordering. There are many other approximate schemes, such as the local Kemenization, Borda count, and scaled footrule aggregation.

2.2. CLUSTERING

Data clustering is a fundamental problem to partition data into disjoint groups, such that elements in the same group are similar to one another but elements from different groups are dissimilar. Although various clustering algorithms have been proposed (Hartigan & Wong, 1979; Ester et al., 1996; Kohonen, 1990; Dhillon & Modha, 2001; Reynolds, 2009) , conventional algorithms often yield poor performance on high-dimensional data due to the curse of dimensionality and ineffectiveness of similarity metrics. Dimensionality reduction and feature transform methods have been studied to map raw data into a new feature space, in which they are more easily separated. Linear transforms, such as PCA (Wold et al., 1987) , and non-linear transformations, including kernel methods (Hofmann et al., 2008) and spectral clustering (Ng et al., 2002) , have been proposed.



Figure 1: A clustering example of facial photos, which are ordered according to ages. Without any supervision, the proposed algorithm can obtain meaningful clusters using identity features.

