COLDEXPAND: SEMI-SUPERVISED GRAPH LEARNING IN COLD START

Abstract

Most real-world graphs are dynamic and eventually face the cold start problem. A fundamental question is how the new cold nodes acquire initial information in order to be adapted into the existing graph. Here we postulates the cold start problem as a fundamental issue in graph learning and propose a new learning setting, "Expanded Semi-supervised Learning." In expanded semi-supervised learning we extend the original semi-supervised learning setting even to new cold nodes that are disconnected from the graph. To this end, we propose ColdExpand model that classifies the cold nodes based on link prediction with multiple goals to tackle. We experimentally prove that by adding additional goal to existing link prediction method, our method outperforms the baseline in both expanded semi-supervised link prediction (at most 24%) and node classification tasks (at most 15%). To the best of our knowledge this is the first study to address expansion of semisupervised learning to unseen nodes.

1. INTRODUCTION

Graph-based semi-supervised learning has attracted much attention thanks to its applicability to real-world problems. For example, a social network is graph-structured data in which people in the network are considered to be nodes and relationships between people are considered to be edges: two people are friends or sharing posts, etc. With this structural information, we can infer some unknown attributes of a person (node) based on the information of people he is connected to (i.e., semi-supervised node classification). In the case of retail applications, customers and products can be viewed as heterogeneous nodes and edges between customers and products can represent relationships between the customers and the purchased products. Such a graph can be used to represent spending habits of each customer and we can recommend a product to a user by inferring the likelihood of connection between the user and the product (i.e., semi-supervised link prediction). Recent progress on Graph Neural Networks (GNN) (Bruna et al., 2013; Kipf & Welling, 2016a; Gilmer et al., 2017; Veličković et al., 2018; Jia et al., 2019) allows us to effectively utilize the expressive power of the graph-structured data and to solve various graph related tasks. Early GNN methods tackled semi-supervised node classification task, a task to label all nodes within the graph when only a small subset of nodes is labeled, achieving a satisfactory performance (Zhou et al., 2004) . Link prediction is another graph-related task that was covered comparatively less than other tasks in the field of GNNs. In the link prediction task, the goal is to estimate the likelihood of connection between two nodes given node feature data and topological structure data. Link prediction can be used in recommendation tasks (Chen et al., 2005; Sarwar et al., 2001; Lika et al., 2014; Li & Chen, 2013; Berg et al., 2017) or graph completion tasks (Kazemi & Poole, 2018; Zhang & Chen, 2018) . Most of the work on semi-supervised graph learning and link prediction assumes a static graph, that is, the structural information is at least "partially" observable in terms of nodes. In the real world, however, new users or items can be added (as nodes) without any topological information (Gope & Jain, 2017) . This is also referred as the cold start problem when a new node is presented to an existing graph without a single edge. In contrast to the warm start case, in which at least some topological information is provided, the cold start problem is an extreme case where there isn't any topological information to refer. In this setting, previous semi-supervised learning algorithms can not propagate information to the cold nodes. Even though the cold start problem is an extreme setting, it is an inevitable problem that occurs often in the real world data.

