COLLABORATIVE SYMMETRICITY EXPLOITATION FOR OFFLINE LEARNING OF HARDWARE DESIGN SOLVER Anonymous

Abstract

This paper proposes collaborative symmetricity exploitation (CSE), a novel symmetric learning scheme of contextual policy for offline black-box placement problems. Leveraging the symmetricity increases data-efficiency by reducing the solution space, and improves generalization capability by capturing the invariant nature present regardless of changing context. To this end, we design a learning scheme that reduces the order bias (ex., neural network recognizes {1, 2, 3} and {2, 1, 3} as difference placement design) inherited from a sequential decision-making scheme of neural policy by imposing action-permutation (AP)-symmetricity (i.e, the permuted sequences are symmetric-placement of the original sequence) of placement problems. We first defined the order bias and proved that AP-symmetricity is imposed when the order bias of neural policy becomes zero. Then, we designed two collaborative losses for learning neural policy with reduced order bias: expert exploitation and self-exploitation. The expert exploitation loss is designed to clone the behavior of the expert solutions considering order bias. The self-exploitation loss is designed to be a special form of order bias where it measures AP-symmetricity from a self-generated solution. CSE is applied to the decoupling capacitor placement problem (DPP) benchmark, a significant offline black-box placement design problem in hardware domain that requires contextual policy. Experiments show that CSE outperforms state-of-the-art solver for the DPP benchmark.

1. INTRODUCTION

With the CMOS technology shrinking and increasing data rate, the design complexity of very largescale integrated (VLSI) has increased. Human experts are no longer able to design hardware without the help of electrical design automation (EDA) tools, and EDA tools now suffer from long simulation time and insufficient computing power, making machine learning (ML) application to hardware design inevitable. Many studies have already shown that deep reinforcement learning (DRL), one of the representative ML methods for sequential decision making, is promising in various tasks in modern chip design; chip placement (Mirhoseini et al., 2021; Agnesina et al., 2020 ), routing (Liao et al., 2019; 2020) , circuit design (Zhao & Zhang, 2020), logic synthesis (Hosny et al., 2020; Haaswijk et al., 2018 ) and bi-level hardware optimization (Cheng & Yan, 2021). However, most previous DRL-based hardware design methods do not take the following into consideration. (a) Online simulators for hardware are usually time intensive and inaccurate; thus, learning with existing offline data by experts is more reliable. Since there exists a limited number of offline hardware data, a data-efficient learning scheme is necessary. (b) Hardware design is composed of electrically coupled multi-level tasks where task conditions are determined by the design of higherlevel tasks; thus, a solver (i.e., contextualized policy conditioned by higher-level tasks) with high generalization capability to adapt to varying task conditions is necessary. In this paper, we leverage the solution symmetricity of placement problem for data efficiency and generalization capability. Conventional sequential decision-making schemes for placement problems (Park et al., 2020; Mirhoseini et al., 2021; Cheng & Yan, 2021) auto-regressively generate solutions without considering the solution symmetricity, thus having the order bias; the neural network identifies the action-permutation (AP) symmetric solutions (i.e. identical placement designs), for instance, {1, 2, 3} and {2, 1, 3}, as different solutions. Our proposed method overcomes the order bias limitation of the previous sequential decision-making schemes with a novel regularization 1

