WINNING THE L2RPN CHALLENGE: POWER GRID MANAGEMENT VIA SEMI-MARKOV AFTERSTATE ACTOR-CRITIC

Abstract

Safe and reliable electricity transmission in power grids is crucial for modern society. It is thus quite natural that there has been a growing interest in the automatic management of power grids, exemplified by the Learning to Run a Power Network Challenge (L2RPN), modeling the problem as a reinforcement learning (RL) task. However, it is highly challenging to manage a real-world scale power grid, mostly due to the massive scale of its state and action space. In this paper, we present an off-policy actor-critic approach that effectively tackles the unique challenges in power grid management by RL, adopting the hierarchical policy together with the afterstate representation. Our agent ranked first in the latest challenge (L2RPN WCCI 2020), being able to avoid disastrous situations while maintaining the highest level of operational efficiency in every test scenario. This paper provides a formal description of the algorithmic aspect of our approach, as well as further experimental studies on diverse power grids.

1. INTRODUCTION

The power grid, an interconnected network for delivering electricity from producers to consumers, has become an essential component of modern society. For a safe and reliable transmission of electricity, it is constantly monitored and managed by human experts in the control room. Therefore, there has been growing interest in automatically controlling and managing the power grid. As we make the transition to sustainable power sources such as solar, wind, and hydro (Rolnick et al., 2019) , power grid management is becoming a very complex task beyond human expertise, calling for data-driven optimization. Yet, automatic control of a large-scale power grid is a challenging task since it requires complex yet reliable decision-making. While most approaches have focused on controlling the generation or the load of electricity (Venkat et al., 2008; Zhao et al., 2014; Huang et al., 2020) , managing the power grid through the topology control (changing the connection of power lines and bus assignments in substations) would be the ultimate goal. By reconfiguring the topology of the power grid, it can reroute the flow of electricity, which enables the transmission of electricity from the producers to consumers efficiently and thus prevent surplus production. There are preliminary studies of the grid topology control in the power systems literature (Fisher et al., 2008; Khodaei & Shahidehpour, 2010) , but due to its large, combinatorial, and non-linear nature, these methods do not provide a practical solution to be deployed to the real-world. On the other hand, deep Reinforcement Learning (RL) has shown significant progress in complex sequential decision-making tasks, such as Go (Silver et al., 2016) and arcade video games (Mnih et al., 2015) , purely from data. RL is also perceived as a promising candidate to address the challenges of power grid management (Ernst et al., 2004; Dimeas & Hatziargyriou, 2010; Duan et al., 2020; Zhang et al., 2020; Hua et al., 2019) . In this regard, we present Semi-Markov

