GN-TRANSFORMER: FUSING AST AND SOURCE CODE INFORMATION IN GRAPH NETWORKS

Abstract

As opposed to natural languages, source code understanding is influenced by grammar relations between tokens regardless of their identifier name. Considering graph representation of source code such as Abstract Syntax Tree (AST) and Control Flow Graph (CFG), can capture a token's grammatical relationships that are not obvious from the source code. Most existing methods are late fusion and underperform when supplementing the source code text with a graph representation. We propose a novel method called GN-Transformer to fuse representations learned from graph and text modalities under the Graph Networks (GN) framework with attention mechanism. Our method learns the embedding on a constructed graph called Syntax-Code Graph (SCG). We perform experiments on the structure of SCG, an ablation study on the model design and the hyperparamaters to conclude that the performance advantage is from the fusion method and not the specific details of the model. The proposed method achieved state of the art performance in two code summarization datasets and across three metrics.

1. INTRODUCTION

Code summarization is the task of generating a readable summary that describes the functionality of a snippet. Such task requires a high-level comprehension of a source code snippet thus it is an effective task to evaluate whether a Deep Learning Model is able to capture complex relations and structures inside code. Programming languages are context-free formal language, an unambiguous representation, Abstract Syntax Tree (AST), could be derived from a source code snippet. A parse tree based representation of code is precise and without noise. An AST accurately describes the structure of a snippet and relationships between tokens which provides valuable supplementary information for code understanding. Using graph representations of source code has been the focus of multiple methods that perform code summarization. extract AST features, however the cross-modal interaction (Veličković, 2019) is very limited since the AST and code features are independently extracted by separate models then simply concatenated or summed. In this paper we propose a novel architecture GN-Transformer shown in Figure 2 to fuse Graph information with an equivalent sequence representation. In summary: • We extend Graph Networks (GN) (Battaglia et al., 2018) to a novel GN-Transformer architecture that is a sequence of GN encoder blocks followed by a vanilla Transformer decoder. • We propose a novel method for early fusion of the AST representation and that of a code snippet sequence called Syntax-Code Graph (SCG) • We evaluate our approach on the task of code summarization and outperform the previous state of the art in two datasets and across three metrics. We evaluated our model on Java and Python datasets used by Ahmad et al. (2020) . We compared our results to those of Ahmad et al. (2020) . Two qualitative results are presented in Figure 1 . We make available our code, trained models and pre-processed datasets in our supplementary package, and we will open-source it after the review process concludes.

2. FUSING GRAPH AND SEQUENCE INFORMATION

Previous methods consider sequences and graphs as two modalities that are processed independently. For a sequence, recurrent architectures such as RNNs, LSTMs, GRUs are commonly used. CNNs have also been applied on sliding windows of sequences (Kim, 2014) . Transformers (Vaswani et al., 2017) became a popular choice for sequences in recent years. For graph data, spectrum-based methods like GCNs (Bruna et al., 2014) capture graph structure through a spectrum. Non-spectrum methods like GraphSAGE (Hamilton et al., 2017) aggregates information from neighboring nodes using different aggregators, GAT (Veličković et al., 2018) introduced attention mechanism to aggregate neighboring information. Early fusion of multiple modalities is a challenging task. As a result, late fusion methods are used when considering multi-modal information in code summarization tasks. The cross-modal interactions are less efficient in late fusion as compared to early fusion Veličković (2019). In Section 2.1 we discuss early fusion approaches of code sequence with an AST. In Section 2.2 we discuss representing sequence in a graph.

2.1. EARLY FUSION OF SEQUENCE AND GRAPH

For early fusion of a sequence and a graph, it is common to represent them under a single unified representation and input to a deep learning model. Random walks (Perozzi et al., 2014) and



Figure 1: Examples of generated summarizations on Java (red blocks) and Python (blue blocks) test set. Transformer is a vanilla Transformer, Transformer (full) implements relative positional encoding and copy mechanism from Ahmad et al. (2020).

Figure 2: Overall structure of our model. Encoder consists of multiple GN-Transformer blocks. We denote '+' as a residual connection followed by a normalization layer. In 'Node embeddings of graph batch', each black bar represents the nodes embedding of a graph in the input batch. Blue dots represent token nodes, grey dots denote padding. Nodes embedding in the grey box are fetched as input to the decoder and AST nodes embedding (red dots) are discarded.

