A DESIGN SPACE STUDY FOR LISTA AND BEYOND

Abstract

In recent years, great success has been witnessed in building problem-specific deep networks from unrolling iterative algorithms, for solving inverse problems and beyond. Unrolling is believed to incorporate the model-based prior with the learning capacity of deep learning. This paper revisits the role of unrolling as a design approach for deep networks: to what extent its resulting special architecture is superior, and can we find better? Using LISTA for sparse recovery as a representative example, we conduct the first thorough design space study for the unrolled models. Among all possible variations, we focus on extensively varying the connectivity patterns and neuron types, leading to a gigantic design space arising from LISTA. To efficiently explore this space and identify top performers, we leverage the emerging tool of neural architecture search (NAS). We carefully examine the searched top architectures in a number of settings, and are able to discover networks that are consistently better than LISTA. We further present more visualization and analysis to "open the black box", and find that the searched top architectures demonstrate highly consistent and potentially transferable patterns. We hope our study to spark more reflections and explorations on how to better mingle model-based optimization prior and data-driven learning.

1. INTRODUCTION

The signal processing and optimization realm has an everlasting research enthusiasm on addressing ill-conditioned inverse problems, that are often regularized by handcrafted model-based priors, such as sparse coding, low-rank matrix fitting and conditional random fields. Since closed-form solutions are typically unavailable for those model-based optimizations, many analytical iterative solvers arise to popularity. More recently, deep learning based approaches provide an interesting alternative to inverse problems. A learning-based inverse problem solver attempts to approximate the inverse mapping directly by optimizing network parameters, by fitting "black box" regression from observed measurements to underlying signals, using synthetic or real-world sample pairs. Being model-based and model-free respectively, the analytical iterative solvers and the learningbased regression make two extremes across the spectrum of inverse problem solutions. A promising direction arising in-between them is called algorithm unrolling (Monga et al., 2019) . Starting from an analytical iterative solver designed for model-based optimization, its unrolled network architecture can be generated by cascading the iteration steps for a finite number of times, or equivalently, by running the iterative algorithm with early stopping. The original algorithm parameters will also turn into network parameters. Those parameters are then trained from end to end using standard deep network training, rather than being derived analytically or selected from cross-validation. Unrolling was first proposed to yield faster trainable regressors for approximating iterative sparse solvers (Gregor & LeCun, 2010) , when one needs to solve sparse inverse problems on similar data repeatedly. Later on, the unrolled architectures were believed to incorporate model-based priors while enjoying the learning capacity of deep networks empowered by training data, and therefore became a rising direction in designing principled and physics-informed deep architectures. The growing popularity of unrolling lies in their demonstrated effectiveness in developing compact, data-efficient, interpretable and high-performance architectures, when the underlying optimization model is assumed available. Such approaches have witnessed prevailing success in applications such as compressive sensing (Zhang & Ghanem, 2018 ), computational imaging (Mardani et al., 2018) , wireless communication (Cowen et al., 2019; Balatsoukas-Stimming & Studer, 2019 ), computer vision (Zheng et al., 2015; Peng et al., 2018) , and other algorithms such as ADMM (Xie et al., 2019) . The empirical success of unrolling has sparkled many curiosities towards its deeper understanding. A series of efforts (Moreau & Bruna, 2017; Giryes et al., 2018; Chen et al., 2018; Liu et al., 2019; Ablin et al., 2019; Takabe & Wadayama, 2020) explored the theoretical underpinning of unrolling as a specially adapted iterative optimizer to minimizing the specific objective function, and proved the favorable convergence rates achieved over classical iterative solver, when the unrolled architectures are trained to (over)fit particular data. Orthogonally, this paper reflects on unrolling as a design approach for deep networks. The core question we ask is: et al., 2019) . Besides, the safeguarding mechanism was also introduced for guiding learned updates to ensure convergence, even when the test problem shifts from the training distribution (Heaton et al., 2020) . For solving model-based On a separate note, many empirical works (Wang et al., 2016a; b; Gong et al., 2020) advocated that the unrolled architecture, when used as a building block for an end-to-end deep model, implicitly enforces some structural prior towards the model training (resulting from the original optimization objective) (Dittmer et al., 2019) . That could be viewed as a special example of "architecture as prior" (Ulyanov et al., 2018) . A recent survey (Monga et al., 2019) presents a comprehensive discussion. Specifically, the authors suggested that since iterative algorithms are grounded on domain-specific formulations, they embed reasonably accurate characterization of the target function. The unrolled networks, by expanding the learnable capacity of iterative algorithms, become "tunable" to approximate the target function more accurately. Meanwhile compared to generic networks, they span a relatively small subset in the function space and therefore can be trained more data-efficiently.

1.2. MOTIVATIONS AND CONTRIBUTIONS

This paper aims to quantitatively assess "how good the unrolled architectures actually are", using LISTA for sparse recovery as a representative example. We present the first design space ablation studyfoot_0 on LISTA: starting from the original unrolled architecture, we extensively vary the connectivity patterns and neuron types. We seek and assess good architectures in a number of challenging settings, and hope to expose successful design patterns from those top performers. As we enable layer-wise different skip connections and neuron types, the LISTA-oriented design space is dauntingly large (see Sections 2.1 and 2.2 for explanations). As its manual exploration is infeasible, we introduce the tool of neural architecture search (NAS) into the unrolling field. NAS



We define a design space as a family of models derived from the same set of architecture varying rules.



inverse problems, what is the role of unrolling in designing deep architectures? What can we learn from unrolling, and how to go beyond?1.1 RELATED WORKS: PRACTICES AND THEORIES OF UNROLLING(Gregor & LeCun, 2010)  pioneered to develop a learning-based model for solving spare coding, by unrolling the iterative shrinkage thresholding algorithm (ISTA) algorithm(Blumensath & Davies,  2008)  as a recurrent neural network (RNN). The unrolled network, called Learned ISTA (LISTA), treated the ISTA algorithm parameters as learnable and varied by iteration. These were then finetuned to obtain optimal performance on the data for a small number of iterations. Numerous works(Sprechmann et al., 2015; Wang et al., 2016a; Zhang & Ghanem, 2018; Zhou et al., 2018)  followed this idea to unroll various iterative algorithms for sparse, low-rank, or other regularized models.

