WINNING THE L2RPN CHALLENGE: POWER GRID MANAGEMENT VIA SEMI-MARKOV AFTERSTATE ACTOR-CRITIC

Abstract

Safe and reliable electricity transmission in power grids is crucial for modern society. It is thus quite natural that there has been a growing interest in the automatic management of power grids, exemplified by the Learning to Run a Power Network Challenge (L2RPN), modeling the problem as a reinforcement learning (RL) task. However, it is highly challenging to manage a real-world scale power grid, mostly due to the massive scale of its state and action space. In this paper, we present an off-policy actor-critic approach that effectively tackles the unique challenges in power grid management by RL, adopting the hierarchical policy together with the afterstate representation. Our agent ranked first in the latest challenge (L2RPN WCCI 2020), being able to avoid disastrous situations while maintaining the highest level of operational efficiency in every test scenario. This paper provides a formal description of the algorithmic aspect of our approach, as well as further experimental studies on diverse power grids.

1. INTRODUCTION

The power grid, an interconnected network for delivering electricity from producers to consumers, has become an essential component of modern society. For a safe and reliable transmission of electricity, it is constantly monitored and managed by human experts in the control room. Therefore, there has been growing interest in automatically controlling and managing the power grid. As we make the transition to sustainable power sources such as solar, wind, and hydro (Rolnick et al., 2019) , power grid management is becoming a very complex task beyond human expertise, calling for data-driven optimization. Yet, automatic control of a large-scale power grid is a challenging task since it requires complex yet reliable decision-making. While most approaches have focused on controlling the generation or the load of electricity (Venkat et al., 2008; Zhao et al., 2014; Huang et al., 2020) , managing the power grid through the topology control (changing the connection of power lines and bus assignments in substations) would be the ultimate goal. By reconfiguring the topology of the power grid, it can reroute the flow of electricity, which enables the transmission of electricity from the producers to consumers efficiently and thus prevent surplus production. There are preliminary studies of the grid topology control in the power systems literature (Fisher et al., 2008; Khodaei & Shahidehpour, 2010) , but due to its large, combinatorial, and non-linear nature, these methods do not provide a practical solution to be deployed to the real-world. On the other hand, deep Reinforcement Learning (RL) has shown significant progress in complex sequential decision-making tasks, such as Go (Silver et al., 2016) and arcade video games (Mnih et al., 2015) , purely from data. RL is also perceived as a promising candidate to address the challenges of power grid management (Ernst et al., 2004; Dimeas & Hatziargyriou, 2010; Duan et al., 2020; Zhang et al., 2020; Hua et al., 2019) . In this regard, we present Semi-Markov Figure 1 : An example of a power grid with 4 substations, 2 generators, 2 loads, and 5 lines. Starting from the left, a bus assignment action a t reconfigures the grid and then the next state s t+1 is determined by exogenous event e t+1 , such as the change of power demands in loads. The diagonal line was experiencing overflow, but the action a t is shown to revert the overflow. The power loss also reduced from 15 to 13. Afterstate Actor-Critic (SMAAC), an RL algorithm that effectively tackles the challenges in power grid management. One of the main challenges in RL for the real-world scale power grid management lies in its massive state and action space. We address the problem by adopting a goal-conditioned hierarchical policy with the afterstate representation. First, we represent state-action pairs as afterstates (Sutton & Barto, 2018), the state after the agent has made its decision but before the environment has responded, to efficiently cover the large state-action space. The afterstate representation can be much more succinct than the state-action pair representation when multiple state-action pairs are leading to an identical afterstate. For example, in the case of controlling the topology of the power grid, a pair of a current topology and an action of topology modification can be represented as a reconfigured topology, since the topology is deterministically reconfigured by the action. Then the next state is determined by random external factors, such as the change of power demands in load. Second, we extend this idea to a hierarchical framework, where the high-level policy produces a desirable topology under the current situation, and the low-level policy takes care of figuring out an appropriate sequence of primitive topology changes. Combined together, our hierarchical policy architecture with afterstates facilitates effective exploration for good topology during training. Our algorithm ranked first in the latest international competition on training RL agents to manage power grids, Learning To Run a Power Network (L2RPN) WCCI 2020. In this paper, we further evaluate our approach using Grid2Op, the open-source power grid simulation platform used in the competition, by training and testing the agent in 3 different sizes of power grids. We show that the agent significantly outperforms all of the baselines in all grids except for the small grid where the task was easy for all algorithms.

2.1. GRID2OP ENVIRONMENT

We briefly overview Grid2Op, the open-source simulation platform for power grid operation used in the L2RPN WCCI 2020 challenge. Grid2Op models realistic concepts found in realworld operations used to test advanced control algorithms, which follow real-world power system operational constraints and distributions (Kelly et al., 2020) . The power grid is essentially a graph composed of nodes corresponding to substations that are connected to loads, generators, and power lines. The generator produces electricity, the load consumes electricity, and the power line transmits electricity between substations. The substation can be regarded as a router in the network, which determines where to transmit electricity. Grid2Op considers 2 conductors per substation, known as the double busbar system. This means that the elements connected to a substation, i.e. loads, generators, and power lines, can be assigned to one of the two busbars, and the power travels only over the elements on the same busbar. Thus, each substation can be regarded as being split into two nodes.

