AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHI-TECTURE SEARCH

Abstract

Although Transformer architectures have been successfully applied to graph data with the advent of Graph Transformer, the current design of Graph Transformers still heavily relies on human labor and expertise knowledge to decide on proper neural architectures and suitable graph encoding strategies at each Transformer layer. In literature, there have been some works on the automated design of Transformers focusing on non-graph data such as texts and images without considering graph encoding strategies, which fail to handle the non-euclidean graph data. In this paper, we study the problem of automated graph Transformers, for the first time. However, solving these problems poses the following challenges: i) how can we design a unified search space for graph Transformer, and ii) how to deal with the coupling relations between Transformer architectures and the graph encodings of each Transformer layer. To address these challenges, we propose Automated Graph Transformer (AutoGT), a neural architecture search framework that can automatically discover the optimal graph Transformer architectures by joint optimization of Transformer architecture and graph encoding strategies. Specifically, we first propose a unified graph Transformer formulation that can represent most state-ofthe-art graph Transformer architectures. Based upon the unified formulation, we further design the graph Transformer search space that includes both candidate architectures and various graph encodings. To handle the coupling relations, we propose a novel encoding-aware performance estimation strategy by gradually training and splitting the supernets according to the correlations between graph encodings and architectures. The proposed strategy can provide a more consistent and fine-grained performance prediction when evaluating the jointly optimized graph encodings and architectures. Extensive experiments and ablation studies show that our proposed AutoGT gains sufficient improvement over state-of-the-art hand-crafted baselines on all datasets, demonstrating its effectiveness and wide applicability.

1. INTRODUCTION

Recently, designing Transformer for graph data has attracted intensive research interests (Dwivedi & Bresson, 2020; Ying et al., 2021) . As a powerful architecture to extract meaningful information from relational data, the graph Transformers have been successfully applied in natural language processing (Zhang & Zhang, 2020; Cai & Lam, 2020; Wang et al., 2023) , social networks (Hu et al., 2020b) , chemistry (Chen et al., 2019; Rong et al., 2020 ), recommendation (Xia et al., 2021) etc. However, developing a state-of-the-art graph Transformer for downstream tasks is still challenging because it heavily relies on the tedious trial-and-error hand-crafted human design, including determining the best Transformer architecture and the choices of proper graph encoding strategies to utilize, etc. In addition, the inefficient hand-crafted design will also inevitably introduce human bias, which leads to sub-optimal solutions for developing graph transformers. In literature, there have been works on automatically searching for the architectures of Transformer, which are designed specifically for data in Natural Language Processing (Xu et al., 2021) and Computer Vision (Chen et al., 2021b) . These works only focus on non-graph data without considering the graph encoding strategies which are shown to be very important in capturing graph information (Min et al., 2022a) , thus failing to handle graph data with non-euclidean properties. In this paper, we study the problem of automated graph Transformers for the first time. However, previous work (Min et al., 2022a) has demonstrated that a good graph Transformer architecture is expected to not only select proper neural architectures for every layer but also utilize appropriate encoding strategies capable of capturing various meaningful graph structure information to boost graph Transformer performance. Therefore, there exist two critical challenges for automated graph Transformers: • How to design a unified search space appropriate for graph Transformer? A good graph Transformer needs to handle the non-euclidean graph data, requiring explicit consideration of node relations within the search space, where the architectures, as well as the encoding strategies, can be incorporated simultaneously. • How to conduct encoding-aware architecture search strategy to tackle the coupling relations between Transformer architectures and graph encoding? Although one simple solution may resort to a one-shot formulation enabling efficient searching in vanilla Transformer operation space which can change its functionality during supernet training, the graph encoding strategies differ from vanilla Transformer in containing certain meanings related to structure information. How to train an encoding-aware supernet specifically designed for graphs is challenging. To address these challenges, we propose Automated Graph Transformer, AutoGTfoot_0 , a novel neural architecture search method for graph Transformer. In particular, we propose a unified graph Transformer formulation to cover most of the state-of-the-art graph Transformer architectures in our search space. Besides the general search space of the Transformer with hidden dimension, feed-forward dimension, number of attention head, attention head dimension, and number of layers, our unified search space introduces two new kinds of augmentation strategies to attain graph information: node attribution augmentation and attention map augmentation. To handle the coupling relations, we further propose a novel encoding-aware performance estimation strategy tailored for graphs. As the encoding strategy and architecture have strong coupling relations when generating results, our AutoGT split the supernet based on the important encoding strategy during evaluation to handle the coupling relations. As such, we propose to gradually train and split the supernets according to the most coupled augmentation, attention map augmentation, using various supernets to evaluate different architectures in our unified searching space, which can provide a more consistent and fine-grained performance prediction when evaluating the jointly optimized architecture and encoding. In summary, we made the following contributions: • We propose Automated Graph Transformer, AutoGT, a novel neural architecture search framework for graph Transformer, which can automatically discover the optimal graph Transformer architectures for various down-streaming tasks. To the best of our knowledge, AutoGT is the first automated graph Transformer framework. • We design a unified search space containing both the Transformer architectures and the essential graph encoding strategies, covering most of the state-of-the-art graph Transformer, which can lead to global optimal for structure information excavation and node information retrieval. • We propose an encoding-aware performance estimation strategy tailored for graphs to provide a more accurate and consistent performance prediction without bringing heavier computation costs. The encoding strategy and the Transformer architecture are jointly optimized to discover the best graph Transformers. • The extensive experiments show that our proposed AutoGT model can significantly outperform the state-of-the-art baselines on graph classification tasks over several datasets with different scales.

2. RELATED WORK

The Graph Transformer. Graph Transformer, as a category of neural networks, enables Transformer to handle graph data (Min et al., 2022a) . Several works (Dwivedi & Bresson, 2020; Ying et al., 



Our codes are publicly available at https://github.com/SandMartex/AutoGT

