HLOENV: A GRAPH REWRITE ENVIRONMENT FOR DEEP LEARNING COMPILER OPTIMIZATION RESEARCH

Abstract

We introduce HloEnv, an environment based on Accelerated Linear Algebra (XLA) for deep learning (DL) compiler optimization research. HloEnv transforms all graph rewrites into a common representation, providing a flexible interface to control and modify existing graph optimization passes. In this representation, an XLA pass is converted into a set of sequential rewrite decisions, which control when and if the rewrites are applied. Along with HloEnv, we present a dataset with broad coverage of computation graphs drawn from modern real-world machine learning models. We select two XLA passes with the largest impact on the runtime of the compiled program, and explore the potential for further improvement over XLA in this decision space. We show that using simple heuristics for decision-making can achieve on-par or better performance than XLA. Using search algorithms further boosts performance. We intend for HloEnv and our dataset to be an open-source, community-driven effort that helps spur advances in DL compiler optimization research.

1. INTRODUCTION

Deep Learning (DL) models have been getting significantly larger and more computationally expensive (Thompson et al., 2020) . As a result, computational efficiency is now increasingly important for the economic and technical viability, as well as the environmental sustainability of a DL project. DL compiler optimization is important for achieving this efficiency. A DL compiler parses user-defined DL model code (usually written in Python) into a high-level directed acyclic graph (DAG) that can then be optimized to run efficiently on DL hardware through a sequence of sub-graph rewrite passes. Current production-ready DL compilers are still heavily hand-engineered, and require deep domain knowledge to create well-optimized results. Great efforts have been made to alleviate the reliance on human engineers. TASO (Jia et al., 2019c) is the most representative work on search-based DL compiler optimization. It automatically generates graph rewrites and searches for better optimization solutions on a larger search space. However, the set of DL operators it considers contains only 12 operators, which does not generalize well to newly emerged DL models. Recent works on learning-based DL compiler optimization such as REGAL (Paliwal et al., 2020) and GO (Zhou et al., 2020a) model a limited set of passes each with a different representation. To our best knowledge, there has been no work that generalizes to all optimization passes with a common representation. In short, at the current stage, research on DL compiler optimization is still facing the following challenges: First, due to their non-unified implementations, there is no systematic interface that has a wide coverage of optimization types. Second, most existing works focus on specific sets of passes. Third, current DL compiler optimization benchmarks use either closed-source or small datasets with a limited set of DL models. The community has not yet centered its efforts to build a publicly accessible dataset of real-world DL computation graphs. We propose the following to address these challenges. First, we develop HloEnv, an environment for the optimization agent to inter-operate XLA (Leary & Wang, 2017), a production-quality crossframework DL compiler. This environment provides a common representation for any type of graph rewrites. Second, we present a dataset with broad coverage of High-Level Operations (HLO) graphs drawn from real-world JAX-implemented machine learning code, extracted from a variety of open-source repositories on GitHub (Table A .2), with spectrum spans through various domains. This provides a more representative dataset of workloads for DL compiler optimization research. Third, based on a thorough analysis of XLA optimization passes, we determine two XLA passes with the most significant impact on the runtime of the compiled program. We explore using simple heuristics and search-based algorithms to further optimize these passes. The design of HloEnv points to a potential future where DL compiler engineers only need to develop and maintain a simple set of rewrite rules and leave the complicated heuristics to machine learninggenerated optimization strategies that generalize to both new DL models and new DL hardware.

2. SYSTEM DESIGN OF HLOENV

2.1 XLA PRELIMINARIES XLA compiles computation graphs in High-Level Operations (HLO) IR format into machine instructions for different backends. As part of this compilation process, XLA runs a series of passes to modify the HLO graph. The passes perform rewrites (using pattern matching and replacement) on the HLO graph to optimize the performance or ensure the correctness of the graph. These passes can be composed in a pipeline and recursively grouped in a parent pipeline. These passes/pipelines are run sequentially in a fixed order and can be run either once or repeatedly in a loop until the pass no longer changes the HLO graph. HloEnv aims to provide a flexible interface that allows for easy control of the XLA optimization passes and pipelines. Each pass and pipeline in HloEnv can be individually set to dry-mode to allow us to intercept and control the rewrites they perform.

2.2. OVERVIEW OF HLOENV

As shown in Fig. 1 , HloEnv's Python interface parses an HLO text file into an HLO graph and loads it into the frontend optimization environment. A user-specified set of XLA passes/pipelines is then applied to the HLO graph. HloEnv executes the original pass/pipeline directly if dry-mode is turned off, while it captures these rewrites without actually applying to the source graph when dry-mode is turned on. An augmented graph that contains both the source graph and all the rewrite opportunities is generated for the user. Using the augmented graph as an input, the user can develop various decision-making agents to decide which rewrites to apply. This process can be applied multiple times until the output HLO graphs stay unchanged (converge) or until a decision is made to end the optimization for that pass/pipeline. The user can then use XLA's backend APIs to generate the final device code. From the decision-making and control point of view, our system defines a Markov Decision Process (MDP) M = (S, A, P, R). S stands for the state space, in our case, the augmented graph. From the state, the agent computes the action in the action space A that decides which rewrite rules to apply. P describes the transition function of the HloEnv, i.e., change of the graph when certain rewrite rules are applied. R is the reward generated from the decision, in our case, the improvement of runtime between the old and new graphs.



Figure 1: The HloEnv interaction loop.

