AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHI-TECTURE SEARCH

Abstract

Although Transformer architectures have been successfully applied to graph data with the advent of Graph Transformer, the current design of Graph Transformers still heavily relies on human labor and expertise knowledge to decide on proper neural architectures and suitable graph encoding strategies at each Transformer layer. In literature, there have been some works on the automated design of Transformers focusing on non-graph data such as texts and images without considering graph encoding strategies, which fail to handle the non-euclidean graph data. In this paper, we study the problem of automated graph Transformers, for the first time. However, solving these problems poses the following challenges: i) how can we design a unified search space for graph Transformer, and ii) how to deal with the coupling relations between Transformer architectures and the graph encodings of each Transformer layer. To address these challenges, we propose Automated Graph Transformer (AutoGT), a neural architecture search framework that can automatically discover the optimal graph Transformer architectures by joint optimization of Transformer architecture and graph encoding strategies. Specifically, we first propose a unified graph Transformer formulation that can represent most state-ofthe-art graph Transformer architectures. Based upon the unified formulation, we further design the graph Transformer search space that includes both candidate architectures and various graph encodings. To handle the coupling relations, we propose a novel encoding-aware performance estimation strategy by gradually training and splitting the supernets according to the correlations between graph encodings and architectures. The proposed strategy can provide a more consistent and fine-grained performance prediction when evaluating the jointly optimized graph encodings and architectures. Extensive experiments and ablation studies show that our proposed AutoGT gains sufficient improvement over state-of-the-art hand-crafted baselines on all datasets, demonstrating its effectiveness and wide applicability.

1. INTRODUCTION

Recently, designing Transformer for graph data has attracted intensive research interests (Dwivedi & Bresson, 2020; Ying et al., 2021) . As a powerful architecture to extract meaningful information from relational data, the graph Transformers have been successfully applied in natural language processing (Zhang & Zhang, 2020; Cai & Lam, 2020; Wang et al., 2023 ), social networks (Hu et al., 2020b ), chemistry (Chen et al., 2019; Rong et al., 2020 ), recommendation (Xia et al., 2021) etc. However, developing a state-of-the-art graph Transformer for downstream tasks is still challenging because it heavily relies on the tedious trial-and-error hand-crafted human design, including determining the best Transformer architecture and the choices of proper graph encoding strategies to utilize, etc. In addition, the inefficient hand-crafted design will also inevitably introduce human bias, which leads to sub-optimal solutions for developing graph transformers. In literature, there have been works on automatically searching for the architectures of Transformer, which are designed specifically for data

