EMP: EFFECTIVE MULTIDIMENSIONAL PERSISTENCE FOR GRAPH REPRESENTATION LEARNING

Abstract

Topological data analysis (TDA) has become increasingly popular in a broad range of machine learning tasks, ranging from anomaly detection and manifold learning to graph classification. Persistent homology being the key approach in TDA provides a unique topological fingerprint of the data by assessing the evolution of various hidden patterns in the data as we vary a scale parameter. Current PH tools are limited to analyze the data by filtering with single parameter while in many applications, several relevant parameters are equally important to get a much finer information on the data. In this paper, we overcome this problem by introducing Effective Multidimensional Persistence (EMP) framework which enables to investigate the data by varying multiple scale parameters simultaneously. EMP framework provides a highly expressive summary of the data by integrating the multiple descriptor functions to the process successfully. EMP naturally adapts to many known single PH summaries and converts them into multidimensional summaries, for example, EMP Landscapes, EMP Silhouettes, EMP Images, and EMP Surfaces. These summaries deliver a multidimensional fingerprint of the data as matrices and arrays which are suitable for various machine learning models. We apply EMP framework in graph classification tasks and observe that EMP boosts the performances of various single PH descriptors, and outperforms the most state-of-the-art methods on benchmark datasets. We further derive theoretical guarantees of the proposed EMP summary and prove the stability properties.

1. INTRODUCTION

In the past decade, topological data analysis (TDA) proved to be a powerful machinery to discover many hidden patterns in various forms of data which are otherwise inaccessible with more traditional methods. In particular, for graph machine learning tasks, while many traditional methods fail, TDA and, specifically, tools of persistent homology (PH), have demonstrated a high potential to detect local and global patterns and to produce a unique topological fingerprint to be used in various machine learning tasks. This makes PH particularly attractive for capturing various characteristics of the complex data which may play the key role behind the learning task performance. In turn, multiparameter persistence, or multipersistence (MP) is a novel idea to further advance the PH machinery by analysing the data in much finer way simultaneously along multiple dimensions. However, because of the technical problems related to commutative algebra because of its multidimensional structure, it has not been defined for general settings yet (See Section 2.1). In this paper, we develop an alternative approach to utilize multipersistence idea very efficiently for various types of data, with main focus on graph representation. In particular, we bypass technical issues with the MP by obtaining a very practical summaries by utilizing slicing idea in a structured way. In turn, we obtain suitable multidimensional topological fingerprints of the data as matrices and arrays where ML models can easily detect the hidden patterns developed in the complex data. Our contributions can be summarized as follows: • We develop a new computationally efficient and highly expressive EMP framework which provides multidimensional topological fingerprints of the data. EMP expands many popular summaries of single persistence to multidimensions by adapting an effective slicing 1

