FINDING PATIENT ZERO: LEARNING CONTAGION SOURCE WITH GRAPH NEURAL NETWORKS

Abstract

Locating the source of an epidemic, or patient zero (P0), can provide critical insights into the infection's transmission course and allow efficient resource allocation. Existing methods use graph-theoretic centrality measures and expensive message-passing algorithms, requiring knowledge of the underlying dynamics and its parameters. In this paper, we revisit this problem using graph neural networks (GNNs) to learn P0. We establish a theoretical limit for the identification of P0 in a class of epidemic models. We evaluate our method against different epidemic models on both synthetic and a real-world contact network considering a disease with history and characteristics of COVID-19. We observe that GNNs can identify P0 close to the theoretical bound on accuracy, without explicit input of dynamics or its parameters. In addition, GNN is over 100 times faster than classic methods for inference on arbitrary graph topologies. Our theoretical bound also shows that the epidemic is like a ticking clock, emphasizing the importance of early contact-tracing. We find a maximum time after which accurate recovery of the source becomes difficult, regardless of the algorithm used.

1. INTRODUCTION

The ability to quickly identify the origin of an outbreak, or "finding patient zero", is critically important in the effort to contain an emerging epidemic. The identification of early transmission chains and the reconstruction of the possible paths of diffusion of the virus can be the difference between stopping an outbreak in its infancy and letting an epidemic unfold and affect a large share of a population. Hence, solving this problem would be instrumental in informing and guiding contact tracing efforts carried out by public health authorities, allowing for optimal resource allocation that can maximize the probability of an early containment of the outbreak. Disease spreading is modeled as a contagion process on a network Stroock & Varadhan (2007); Pastor-Satorras et al. (2015) of human-to-human interactions where infected individuals are going to transmit the virus by infecting (with a certain probability) their direct contacts. In general, contagion processes can capture a wide range of phenomena, from rumor propagation on social media to virus spreading over cyber-physical networks Centola & Macy (2007) ; Baronchelli (2018) ; Wang et al. (2013) ; Mishra & Keshri (2013) . Therefore, learning the source of a contagion process would also have broader impact on various domains, from detecting sources of fake news to defending malware attacks. Learning the index case, or patient zero (P0), is a difficult problem. In this paper, we model disease spreading as a contagion process (chains of transmissions) over a graph. The evolution of an outbreak is noisy and highly dependent on the graph structure and disease dynamics. In addition, in real-world epidemics, there is often a delay from the start of the outbreak to when epidemic surveillance and contact tracing starts. Hence, we might only observe the state of the graph at some intermediate times without access to the complete chains of transmission. Furthermore, due to its stochastic nature, the same source node might lead to different epidemic spreading trajectories. Finally, learning P0 from noisy observations of graph snapshots is computationally intractable and the complexity grows exponentially with the size of the graph Shah & Zaman (2011). 



Most work in learning the dynamics of a contagion process Rodriguez et al. (2011); Mei & Eisner (2017); Li et al. (2018a) have focused on inferring the forward dynamics of the diffusion. In epidemiology, for example, Pastor-Satorras & Vespignani (2001) have studied learning the temporal dynamics of diseases spreading on mobility networks. The problem of learning the reverse dynamics

