COMPACT BILINEAR POOLING VIA GENERAL BILIN-EAR PROJECTION

Abstract

Many factorized bilinear pooling (FBiP) algorithms employ Hadamard productbased bilinear projection to learn appropriate projecting directions to reduce the dimension of bilinear features. However, in this paper, we reveal that the Hadamard product-based bilinear projection makes FBiP miss a lot of possible projecting directions, which will significantly harm the performance of outputted compact bilinear features, including compactness and effectiveness. To address this issue, we propose a general matrix-based bilinear projection based on the rank-k matrix base decomposition, where the Hadamard-based bilinear projection and Y = U T XV are special cases of our proposed one. Thus, our proposed projection can be used improve the algorithms based on the two types of bilinear projections. Using the proposed bilinear projection, we design a novel low-rank factorized bilinear pooling (named RK-FBP), which considers the feasible projecting directions missed by the Hadamard product-based bilinear projection. Thus, our RK-FBP can generate better compact bilinear features. To leverage high-order information in local features, we nest several RK-FBP modules together to formulate a multi-linear pooling that outputs compact multi-linear features. At last, we conduct experiments on several fine-grained image tasks to evaluate our models, which show that our models achieve new state-of-the-art classification accuracy by the lowest dimension.

1. INTRODUCTION

Bilinear pooling (BiP) (Lin, 2015) and its variants (Li et al., 2017b; Lin & Maji, 2017; Wang et al., 2017) employ Kronecker product to yield expressive representations by mining the rich statistical information from a set of local features, and has attracted wide attentions in many applications, such as fine-grained image classification, visual question answering, etc. Although achieving excellent performance, the bilinear features suffer from two shortcomings: (1) the ability of BiP to boost the discriminant information between different classes also magnifies the intra-class variances of representations, which makes BiP easily encounter the burstiness problem (Gao et al., 2020; Zheng et al., 2019) and suffer from a performance deficit; (2) the Kronecker product exploited by BiP usually makes the bilinear features exceptionally high-dimensional, leading to an overfitted training of the succeeding tasks and a hefty computational load by increasing the memory storage. Thus, how to effectively solve the shortcomings of BiP is an important issue. Several approaches have been proposed (Gao et al., 2016; Fukui et al., 2016; Yu et al., 2021; Li et al., 2017b; Kim et al., 2016) to solve the shortcomings of BiP. Among them, the factorized bilinear pooling (FBiP) methods (Li et al., 2017b; Kim et al., 2016; Amin et al., 2020; Yu et al., 2018; Gao et al., 2020) have been promising leads. The essence of FBiP performs a dimension reduction operation on bilinear features. It finds a linear projection to map bilinear features into a low-dimension space with their discriminant information among classes preserved using the least dimensions, and then employs L 2 -normalization to project those low-dimension features on a hyper-sphere. (Bilinear pooling equals the non-linear projection determined by the polynomial kernel function k(x, y) = (< x, y >) 2 Gao et al. (2016) , which makes bilinear features probably linear discriminant.) Thus, the information reflecting the large intra-class variances is abandoned because they do not help distinguish different classes. In this way, the burstiness and high dimension problems are solved simultaneously (Gao et al., 2020; Wei et al., 2018) . This procedure is depicted in Figure 1 . The sub-figure (a) shows a set of samples with a large variance. Sub-figure (c ) is 1

