HLOENV: A GRAPH REWRITE ENVIRONMENT FOR DEEP LEARNING COMPILER OPTIMIZATION RESEARCH

Abstract

We introduce HloEnv, an environment based on Accelerated Linear Algebra (XLA) for deep learning (DL) compiler optimization research. HloEnv transforms all graph rewrites into a common representation, providing a flexible interface to control and modify existing graph optimization passes. In this representation, an XLA pass is converted into a set of sequential rewrite decisions, which control when and if the rewrites are applied. Along with HloEnv, we present a dataset with broad coverage of computation graphs drawn from modern real-world machine learning models. We select two XLA passes with the largest impact on the runtime of the compiled program, and explore the potential for further improvement over XLA in this decision space. We show that using simple heuristics for decision-making can achieve on-par or better performance than XLA. Using search algorithms further boosts performance. We intend for HloEnv and our dataset to be an open-source, community-driven effort that helps spur advances in DL compiler optimization research.

1. INTRODUCTION

Deep Learning (DL) models have been getting significantly larger and more computationally expensive (Thompson et al., 2020) . As a result, computational efficiency is now increasingly important for the economic and technical viability, as well as the environmental sustainability of a DL project. DL compiler optimization is important for achieving this efficiency. A DL compiler parses user-defined DL model code (usually written in Python) into a high-level directed acyclic graph (DAG) that can then be optimized to run efficiently on DL hardware through a sequence of sub-graph rewrite passes. Current production-ready DL compilers are still heavily hand-engineered, and require deep domain knowledge to create well-optimized results. Great efforts have been made to alleviate the reliance on human engineers. TASO (Jia et al., 2019c) is the most representative work on search-based DL compiler optimization. It automatically generates graph rewrites and searches for better optimization solutions on a larger search space. However, the set of DL operators it considers contains only 12 operators, which does not generalize well to newly emerged DL models. Recent works on learning-based DL compiler optimization such as REGAL (Paliwal et al., 2020) and GO (Zhou et al., 2020a) model a limited set of passes each with a different representation. To our best knowledge, there has been no work that generalizes to all optimization passes with a common representation. In short, at the current stage, research on DL compiler optimization is still facing the following challenges: First, due to their non-unified implementations, there is no systematic interface that has a wide coverage of optimization types. Second, most existing works focus on specific sets of passes. Third, current DL compiler optimization benchmarks use either closed-source or small datasets with a limited set of DL models. The community has not yet centered its efforts to build a publicly accessible dataset of real-world DL computation graphs. We propose the following to address these challenges. First, we develop HloEnv, an environment for the optimization agent to inter-operate XLA (Leary & Wang, 2017), a production-quality crossframework DL compiler. This environment provides a common representation for any type of graph rewrites. Second, we present a dataset with broad coverage of High-Level Operations (HLO) graphs drawn from real-world JAX-implemented machine learning code, extracted from a variety of open-source repositories on GitHub (Table A .2), with spectrum spans through various domains. This 1

