STRUCTURAL PRIVACY IN GRAPHS

Abstract

Graph Neural Networks (GNNs) gained popularity to address the tasks over the graph-structured data that best represent many real-world systems. The privacy of the participants of these systems is at risk if the GNNs are not carefully designed. Existing works in privacy-preserving GNNs primarily ensure the privacy of features and labels of a node. To ensure complete privacy related to graph data, its structure also needs to be privatized. We provide a method SPGraph to privatize the graph structure by adding noise to the neighborhood data of the node. Our method addresses two challenges in introducing structural privacy in graphs. Applying randomization on the set of actual neighbors to introduce noise leads to a reduction in the degree of nodes, which is undesirable. To overcome this first challenge, we introduce λ-selector that samples nodes to be added to the set of neighbors. The second challenge is to denoise the neighborhood so that the noise added in the neighborhood does not significantly impact the accuracy. In this view, we use the p-hop neighborhood to compensate for the loss of actual neighbors in the randomization. We continue to use the node and label privacy as implemented in the previous methods for privacy in GNNs. We conduct extensive experiments over real-world datasets to show the impact of perturbation on the graph structure.

1. INTRODUCTION

Real-world systems such as social networks, citation networks, and molecular networks are popularly modeled as graphs. A graph richly represents such systems as it considers all the entities in the system as well as relationships between the entities. Graph Neural Networks (GNNs) are popularly used to tackle the tasks, such as node classification, graph classification, and link prediction are addressed using GNNs. The primary goal of GNNs is the aggregation of structural as well as feature information in an efficient manner. At the same time, it addresses the tasks related to the systems represented as graphs. Problem and Motivation. The problem we address in this work is ensuring the users' data privacy in critical systems that can be represented using graphs. The term data privacy in the case of graphs signifies privacy related to the structure of the graph and the features of the nodes in the graph. We look at the privacy in GNNs for node-level tasks. Most of the previous works in the area of privacy in GNN have ensured the privacy of the features and labels of each of the nodes. It assumes that the server knows the connectivity information; hence in the previous works, the term data privacy means the privacy of node features and node labels. In this work, we consider the privacy of structural information, typically given by the edges in the graphs, along with the feature and label information of the nodes. Challenges. To ensure complete privacy, that is, the privacy of nodes, edges and labels in a graph, the following challenges need to be addressed: 1. Edge privacy or the privacy of the graph structure is to be taken care of to avoid the information from being compromised. The existing methods Sajadmanesh & Gatica-Perez (2021) perturb only the node features and the node labels, however as the graph structure remains publicly available, we need to consider that as private information as well. 2. To preserve structural privacy, there has to be a mechanism to perturb the edges in the graph. Deciding the amount of noise to be added to the edge data and the mechanism to add noise is one of the challenges. We also need a mechanism to correct the added noise so as to balance the accuracy of the predictions. 3. Determining the amount of structure perturbation and node and label perturbation to strike the right balance between privacy and utility in the graph data.

