STRUCTURAL PRIVACY IN GRAPHS

Abstract

Graph Neural Networks (GNNs) gained popularity to address the tasks over the graph-structured data that best represent many real-world systems. The privacy of the participants of these systems is at risk if the GNNs are not carefully designed. Existing works in privacy-preserving GNNs primarily ensure the privacy of features and labels of a node. To ensure complete privacy related to graph data, its structure also needs to be privatized. We provide a method SPGraph to privatize the graph structure by adding noise to the neighborhood data of the node. Our method addresses two challenges in introducing structural privacy in graphs. Applying randomization on the set of actual neighbors to introduce noise leads to a reduction in the degree of nodes, which is undesirable. To overcome this first challenge, we introduce λ-selector that samples nodes to be added to the set of neighbors. The second challenge is to denoise the neighborhood so that the noise added in the neighborhood does not significantly impact the accuracy. In this view, we use the p-hop neighborhood to compensate for the loss of actual neighbors in the randomization. We continue to use the node and label privacy as implemented in the previous methods for privacy in GNNs. We conduct extensive experiments over real-world datasets to show the impact of perturbation on the graph structure.

1. INTRODUCTION

Real-world systems such as social networks, citation networks, and molecular networks are popularly modeled as graphs. A graph richly represents such systems as it considers all the entities in the system as well as relationships between the entities. Graph Neural Networks (GNNs) are popularly used to tackle the tasks, such as node classification, graph classification, and link prediction are addressed using GNNs. The primary goal of GNNs is the aggregation of structural as well as feature information in an efficient manner. At the same time, it addresses the tasks related to the systems represented as graphs. Problem and Motivation. The problem we address in this work is ensuring the users' data privacy in critical systems that can be represented using graphs. The term data privacy in the case of graphs signifies privacy related to the structure of the graph and the features of the nodes in the graph. We look at the privacy in GNNs for node-level tasks. Most of the previous works in the area of privacy in GNN have ensured the privacy of the features and labels of each of the nodes. It assumes that the server knows the connectivity information; hence in the previous works, the term data privacy means the privacy of node features and node labels. In this work, we consider the privacy of structural information, typically given by the edges in the graphs, along with the feature and label information of the nodes. Challenges. To ensure complete privacy, that is, the privacy of nodes, edges and labels in a graph, the following challenges need to be addressed: 1. Edge privacy or the privacy of the graph structure is to be taken care of to avoid the information from being compromised. The existing methods Sajadmanesh & Gatica-Perez (2021) perturb only the node features and the node labels, however as the graph structure remains publicly available, we need to consider that as private information as well. 2. To preserve structural privacy, there has to be a mechanism to perturb the edges in the graph. Deciding the amount of noise to be added to the edge data and the mechanism to add noise is one of the challenges. We also need a mechanism to correct the added noise so as to balance the accuracy of the predictions. 3. Determining the amount of structure perturbation and node and label perturbation to strike the right balance between privacy and utility in the graph data. Figure 1 : A network of users where users want to keep their sensitive data such as age and salary private. This information is privatized by perturbing the node feature and label information. The change is labels is shown with a change in the node color. We address the problem of structure perturbation in such a network before passing on all the perturbed information to the server to further process the data by applying GNN to answer the questions on the graph. Contributions. We propose completely locally private graph neural networks. Complete local privacy is obtained by privatizing the edge information in addition to node and label information. Our method SPGRAPH privatizes the structure of the nodes in the graph data. We provide experimental evidence of the performance of the model by varying the parameters that control the noise added to the edges for perturbations where we perturb the edges according to the structure privatization approach. Paper Organization. This paper is organized as follows. Introduction, motivation, challenges, and contribution of our work are described in section 1. In section 2, we discuss the works related to the topic. Section 3 gives the preliminaries with problem definition and background. Our proposed method is described in section 4. The experimental setup and the results are discussed in section 5 and section 6 concludes the paper. A federated framework for privacy-preserving GNNs for recommendation systems is presented in Wu et al. (2021a) . The GNNs are trained locally at the users' end and they upload the local gradients to a server for aggregation. To enhance the privacy of users by protecting user-item interaction, local differential privacy techniques are applied to the locally computed gradients. Zhou et al. ( 2020) is another work involving federated graph neural networks for privacy-preserving classification tasks, the features, and the edges are split among the users, while all the users have the access to the same set of nodes.



With different possible attacks on Graph Neural Networks, the privacy of the data involved can be compromised. He et al. (2021) introduced seven different link stealing attacks on Graphs. The adversary has black-box access to Graph Neural Networks, and it can infer whether or not any two nodes used in the training of the GNN have a link between them. Wu et al. (2021b) introduced a similar attack called LinkTeller attack that concerns the privacy of the edges in the graph. In a setting where the node features and the adjacency information are with different parties, the party with the adjacency matrix trains the GNN upon receiving the node features from the other party and provides back the inference API. The party holding the node features provides the test node features and also can query API for predictions related to test nodes. The LinkTeller attack tries to infer the links present between nodes, based on the queries. The other works Olatunji et al. (2021), Zügner et al. (2020), Duddu et al. (2020) discuss the attacks possible on GNNs, such as membership inference attack, graph reconstruction attack, and attribute inference attack.

