STRUCTURAL CODE REPRESENTATION LEARNING FOR AUTO-VECTORIZATION

Abstract

The single instruction multiple data (SIMD) capability in modern processors is critical to improving the performance of current compute-intensive programs. SIMD allows architectures to exploit the natural data parallelism that exists in a wide-range of real applications (e.g., games, signal processing, etc) by executing a single instruction on multiple data items simultaneously. Modern compilers use vectorization techniques to exploit the SIMD capability, by detecting data parallelism in scalar source code and transforming a group of scalar instructions into vector-based instructions. In this work, we focus on one of the most common vectorization techniques called loop-based vectorization, which targets loops and optimize their performance by grouping multiple occurrences of the same operation across loop iterations into single SIMD instructions. This is achieved by setting two key parameters: (1) the vectorization factor (VF), and (2) the interleaving factor (IF). Unfortunately, vectorizing loop computations effectively is a key challenging problem for both programmers and compilers due to the large search space. For example, manual vectorization of each loop puts a huge burden on the programmer, is more error-prone, and/or requires expert knowledge of both the software and the architecture. Alternatively, current compilers use fixed-cost models based on expert heuristics to make automatic vectorization decisions. However, these models often ignore the data dependencies, as well as the underlying computation graph. In this paper, we propose a data-driven graph-based learning framework for automatic vectorization, called autograph, which takes an input program, extracts the loops, then learns a structured representation to automatically predict the correct VF/IF factors. Our proposed framework utilizes deep reinforcement learning to learn an optimal policy (observations to actions) from an intelligent agent in a SIMD environment, and automatically injects the predicted vectorization pragmas into the input program. We conducted an extensive evaluation on multiple benchmark datasets and comparisons with state-of-the-art baselines. Our results show that for Polybench, autograph achieves on average 2.47x performance improvement for Polybench compared to neurovectorizer and 3.61x compared to the baseline.

1. INTRODUCTION

The single instruction multiple data (SIMD) mechanisms have been widely incorporated in modern processors such as gaming machines, massively parallel supercomputers, as well as general-purpose processors (Nuzman et al., 2006; Bachega et al., 2004; Peleg & Weiser, 1996) . These mechanisms allow architectures to exploit the natural parallelism that exists in software for real-world applications (e.g., games, signal processing, etc.), by simultaneously executing the same instruction on multiple elements of the input data. Modern compilers use vectorization techniques to exploit the SIMD capability of these architectures. Vectorization techniques allow the compiler to reveal the data parallelism in the scalar source code and converts the code from a scalar implementation to the corresponding functionally-correct vector implementation.This allows portions of the code to run on the processor's high-throughput SIMD units, without any additional effort from the programmer (Porpodas et al., 2018) . With the SIMD architecture, such operations can run in fewer cycles while using less energy to boost performance in applications with vector computations. Vectorization can be classified into two major methods: (i) the loop vectorizer, which operates on loops, and (ii) the superword-level parallelism (SLP) vectorizer (Porpodas, 2017; Mendis et al., 

