NEURAL ARCHITECTURE SEARCH OF SPD MANIFOLD NETWORKS

Abstract

In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks. Unlike the conventional NAS problem, our problem requires to search for a unique computational cell called the SPD cell. This SPD cell serves as a basic building block of SPD neural architectures. An efficient solution to our problem is important to minimize the extraneous manual effort in the SPD neural architecture design. To accomplish this goal, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem using the supernet strategy, which models the architecture search problem as a one-shot training process of a single supernet. Based on the supernet modeling, we exploit a differentiable NAS algorithm on our relaxed continuous search space for SPD neural architecture search. Statistical evaluation of our method on drone, action, and emotion recognition tasks mostly provides better results than the stateof-the-art SPD networks and NAS algorithms. Empirical results show that our algorithm excels in discovering better SPD network design and providing models that are more than 3 times lighter than searched by state-of-the-art NAS algorithms.

1. INTRODUCTION

Designing a favorable neural network architecture for a given application requires a lot of time, effort, and domain expertise. To mitigate this issue, researchers in the recent years have started developing algorithms to automate the design process of neural network architectures (Zoph & Le, 2016; Zoph et al., 2018; Liu et al., 2017; 2018a; Real et al., 2019; Liu et al., 2018b; Tian et al., 2020) . Although these neural architecture search (NAS) algorithms have shown great potential to provide an optimal architecture for a given application, it is limited to handle architectures with Euclidean operations and representation. To deal with non-euclidean data representation and corresponding set of operations, researchers have barely proposed any NAS algorithms -to the best of our knowledge. It is well-known that manifold-valued data representation such as symmetric positive definite (SPD) matrices have shown overwhelming accomplishments in many real-world applications such as pedestrian detection (Tuzel et al., 2006; 2008) , magnetic resonance imaging analysis (Pennec et al., 2006) , action recognition (Harandi et al., 2014) , face recognition (Huang et al., 2014; 2015) , braincomputer interfaces (Barachant et al., 2011) , structure from motion (Kumar et al., 2018; Kumar, 2019) , etc. Also, in applications like diffusion tensor imaging of the brain, drone imaging, samples are collected directly as SPD's. As a result, neural network usage based on Euclidean data representation becomes inefficient for those applications. Consequently, this has led to the development of the SPD neural network (SPDNet) architectures for further improvements in these areas of research (Huang & Van Gool, 2017; Brooks et al., 2019) . However, these architectures are handcrafted, so the operations or the parameters defined for these networks generally change as per the application. This motivated us to propose a new NAS problem of SPD manifold networks. A solution to this problem can reduce unwanted efforts in SPDNet design. Compared to the traditional NAS problem, our NAS problem requires a new definition of computation cell and proposal for diverse SPD candidate operation set. In particular, we model the basic architecture cell with a specific directed acyclic graph (DAG), where each node is a latent SPD representation, and each edge corresponds to a SPD candidate operation. Here, the intermediate transformations between nodes respect the geometry of the SPD manifolds. For solving the suggested NAS problem, we exploit a supernet search strategy which models the architecture search problem as a one-shot training process of a supernet that comprises of a mixture of SPD neural architectures. The supernet modeling enables us to perform a differential architecture search on a continuous relaxation of SPD neural architecture search space, and therefore, can be solved using a gradient descent approach. Our evaluation validates that the proposed method can build a reliable SPD network from scratch. We show the results of our method on benchmark datasets that clearly show results better than handcrafted SPDNet. Our work makes the following contributions: 

2. BACKGROUND

In recent years, plenty of research work has been published in the area of NAS (Gong et al., 2019; Liu et al., 2019; Nayman et al., 2019; Guo et al., 2020) . This is probably due to the success of deep learning for several applications which has eventually led to the automation of neural architecture design. Also, improvements in the processing capabilities of machines has influenced the researchers to work out this computationally expensive yet an important problem. Computational cost for some of the well-known NAS algorithms is in thousands of GPU days which has resulted in the development of several computationally efficient methods (Zoph et al., 2018; Real et al., 2019; Liu et al., 2018a; 2017; Baker et al., 2017; Brock et al., 2017; Bender, 2019; Elsken et al., 2017; Cai et al., 2018; Pham et al., 2018; Negrinho & Gordon, 2017; Kandasamy et al., 2018; Chu et al., 2020) . In this work, we propose a new NAS problem of SPD networks. We solve this problem using a supernet modeling methodology with a one-shot differentiable training process of an overparameterized supernet. Our modeling is driven by the recent progress in supernet methodology. Supernet methodology has shown a great potential than other NAS methodologies in terms of search efficiency. Since our work is directed towards solving a new NAS problem, we confine our discussion to the work that have greatly influenced our method i.e., one-shot NAS methods and SPD networks. To the best of our knowledge, there are mainly two types of one-shot NAS methods based on the architecture modeling (Elsken et al., 2018) (a) parameterized architecture (Liu et al., 2018b; Zheng et al., 2019; Wu et al., 2019; Chu et al., 2020) , and (b) sampled architecture (Deb et al., 2002; Chu et al., 2019) . In this paper, we adhere to the parametric modeling due to its promising results on conventional neural architectures. A majority of the previous work on NAS with continuous search space fine-tunes the explicit feature of specific architectures (Saxena & Verbeek, 2016; Veniat & Denoyer, 2018; Ahmed & Torresani, 2017; Shin et al., 2018) . On the contrary, Liu et al. (Huang & Van Gool, 2017; Brooks et al., 2019; Chakraborty et al., 2020) . To automate the process of SPD network design, in this work, we choose the most promising approaches from these fields (NAS (Liu et al., 2018b) , SPD networks (Huang & Van Gool, 2017) ) and propose a NAS algorithm for SPD inputs. Next, we summarize the essential notions of Riemannian geometry of SPD manifolds, followed by an introduction of some basic SPDNet operations and layers. As some of the introduced operations and layers have been well-studied by the existing literature, we applied them directly to define our SPD neural architectures' search space.



(2018b); Liang et al. (2019); Zhou et al. (2019); Zhang et al. (2020); Wu et al. (2020); Chu et al. (2020) provides architectural diversity for NAS with highly competitive performances. The other part of our work focuses on SPD network architectures. There exist algorithms to develop handcrafted SPDNet

We introduce a NAS problem of SPD manifold networks that opens up a new direction of research in automated machine learning and SPD manifold learning. Based on a supernet modeling, we propose a novel differentiable NAS algorithm for SPD neural architecture search. Concretely, we exploit a sparsemax-based Fréchet mixture of SPD operations to introduce sparsity that is essential for an effective diffentiable search, and bi-level optimization with manifold-based update and convexity-based update to jointly optimize architecture parameters and network kernel weights. • Besides well-studied operations from exiting SPDNets (Huang & Van Gool, 2017; Brooks et al., 2019; Chakraborty et al., 2020), we follow Liu et al. (2018b) to further introduce some new SPD layers, i.e., skip connection, none operation, max pooling and averaging pooling. Our introduced additional set of SPD operations make the search space more diverse for the neural architecture search algorithm to obtain more generalized SPD neural network architectures. • Evaluation on three benchmark datasets shows that our searched SPD neural architectures can outperform the existing handcrafted SPDNets (Huang & Van Gool, 2017; Brooks et al., 2019; Chakraborty et al., 2020) and the state-of-the-art NAS methods (Liu et al., 2018b; Chu et al., 2020). Notably, our searched architecture is more than 3 times lighter than those searched by the traditional NAS algorithms.

