COLLABORATIVE SYMMETRICITY EXPLOITATION FOR OFFLINE LEARNING OF HARDWARE DESIGN SOLVER Anonymous

Abstract

This paper proposes collaborative symmetricity exploitation (CSE), a novel symmetric learning scheme of contextual policy for offline black-box placement problems. Leveraging the symmetricity increases data-efficiency by reducing the solution space, and improves generalization capability by capturing the invariant nature present regardless of changing context. To this end, we design a learning scheme that reduces the order bias (ex., neural network recognizes {1, 2, 3} and {2, 1, 3} as difference placement design) inherited from a sequential decision-making scheme of neural policy by imposing action-permutation (AP)-symmetricity (i.e, the permuted sequences are symmetric-placement of the original sequence) of placement problems. We first defined the order bias and proved that AP-symmetricity is imposed when the order bias of neural policy becomes zero. Then, we designed two collaborative losses for learning neural policy with reduced order bias: expert exploitation and self-exploitation. The expert exploitation loss is designed to clone the behavior of the expert solutions considering order bias. The self-exploitation loss is designed to be a special form of order bias where it measures AP-symmetricity from a self-generated solution. CSE is applied to the decoupling capacitor placement problem (DPP) benchmark, a significant offline black-box placement design problem in hardware domain that requires contextual policy. Experiments show that CSE outperforms state-of-the-art solver for the DPP benchmark.

1. INTRODUCTION

With the CMOS technology shrinking and increasing data rate, the design complexity of very largescale integrated (VLSI) has increased. Human experts are no longer able to design hardware without the help of electrical design automation (EDA) tools, and EDA tools now suffer from long simulation time and insufficient computing power, making machine learning (ML) application to hardware design inevitable. Many studies have already shown that deep reinforcement learning (DRL), one of the representative ML methods for sequential decision making, is promising in various tasks in modern chip design; chip placement (Mirhoseini et al., 2021; Agnesina et al., 2020 ), routing (Liao et al., 2019; 2020) , circuit design (Zhao & Zhang, 2020), logic synthesis (Hosny et al., 2020; Haaswijk et al., 2018) and bi-level hardware optimization (Cheng & Yan, 2021). However, most previous DRL-based hardware design methods do not take the following into consideration. (a) Online simulators for hardware are usually time intensive and inaccurate; thus, learning with existing offline data by experts is more reliable. Since there exists a limited number of offline hardware data, a data-efficient learning scheme is necessary. (b) Hardware design is composed of electrically coupled multi-level tasks where task conditions are determined by the design of higherlevel tasks; thus, a solver (i.e., contextualized policy conditioned by higher-level tasks) with high generalization capability to adapt to varying task conditions is necessary. In this paper, we leverage the solution symmetricity of placement problem for data efficiency and generalization capability. Conventional sequential decision-making schemes for placement problems (Park et al., 2020; Mirhoseini et al., 2021; Cheng & Yan, 2021) auto-regressively generate solutions without considering the solution symmetricity, thus having the order bias; the neural network identifies the action-permutation (AP) symmetric solutions (i.e. identical placement designs), for instance, {1, 2, 3} and {2, 1, 3}, as different solutions. Our proposed method overcomes the order bias limitation of the previous sequential decision-making schemes with a novel regularization technique. Tackling the order bias (i.e. inducing AP-symmetricity) improves the data efficiency of training and generalization capability of the trained policy due to the two following reasons. First, data efficiency in training can be improved as learning the AP-symmetricity reduces the exploration space (see Fig. 1 ); neural network can automatically learn not only from the explored trajectories but also from their symmetric solution trajectories without additional exploration and simulation. Second, generalization capability on task variation can be improved as AP-symmetricity is the task-agnostic nature of placement problems. To this end, we devised collaborative symmetricity exploitation (CSE) framework, a simple but effective method to induce AP-symmetricity with two collaborative learning schemes: expert exploitation and self-exploitation. The expert exploitation simply augments the offline expert data (sequential data) with a random permutation operator and uses it for imitation learning. The self-exploitation generates pseudo-labeled solutions from the current training policy, transforms the pseudo-labeled solution with a random permutation operator, and forces the solver to have an identical probability to generate the original pseudo-labeled solution and the transformed solution. To verify the effectiveness of CSE, we applied CSE to the decoupling capacitance (decap) placement problem (DPP), one of the significant hardware design benchmarks. The objective of DPP is to place a given number of decaps on power distribution network (PDN) with two varying conditions: keep-out regions and probing port location, determined by higher-level problems such as chip placement and routing. The goal of CSE is to train a solver (i.e., contextualized policy) that has high generalization capability to any given task condition. Contribution 1: A novel symmetric learning scheme for contextualized policy. There exists several works (Cohen & Welling, 2016; Thomas et al., 2018; Fuchs et al., 2020; Satorras et al., 2021) that learn various symmetricities of input data in the domain space for regression and classification tasks. However, learning symmetricity in solution space is less studied as learning the symmertrcities in solution space of sequential policy (generative decision) is challenging. Bengio et al. (2021) tackled solution symmetricity of sequential policy by turning the Markov decision process (MDP) tree model into the directed acyclic graph (DAG)-based flow model. However, they target single-task optimization where the optimal solution set is unchanged. On the other hand, our CSE is an effective solution symmetric learning scheme for the contextualized policy capable of adapting to newly given task-condition. Contribution 2: DPP benchmark release. DPP is a widely studied task in hardware domain without public release of the simulation models and source codes for the methods. Also, DPP can be seen as a contextual offline black-box optimization benchmark with extended properties compared to the design-bench (Trabucco et al., 2022) , a representative non-contextual offline black-box optimization benchmark. In this work, by releasing the DPP benchmark with open-source simulation models and our reproduced baselines, DRL-based methods, meta-heuristic methods, behavior cloning-based methods, and our state-of-the-art CSE method, we expect huge industrial impacts on the hardware and the ML communities.

2. DECAP PLACEMENT PROBLEM (DPP) FORMULATION

This paper seeks to solve the decoupling capacitor placement problem (DPP), one of the essential hardware design problems. Decoupling capacitor (decap) is a hardware component that reduces power noise along the power distribution network (PDN) of hardware devices and improves the power integrity (PI). With transistor scaling and continuously decreasing supply voltage margin (Hwang et al., 2021) , power noise has become a huge technical bottleneck in high-speed computing systems. Generally, the more decaps are placed, the more reliable the power supply is. However, adding more decaps requires more space and is costly. Thus, finding an optimal placement of decaps is essential in terms of hardware performance and cost/space-saving.



Figure 1: Conventional sequential decisionmaking method's heterogeneous trajectories from AP-Symmetric solution group.

