PCPS: PATIENT CARDIAC PROTOTYPES

Abstract

Many clinical deep learning algorithms are population-based and difficult to interpret. Such properties limit their clinical utility as population-based findings may not generalize to individual patients and physicians are reluctant to incorporate opaque models into their clinical workflow. To overcome these obstacles, we propose to learn patient-specific embeddings, entitled patient cardiac prototypes (PCPs), that efficiently summarize the cardiac state of the patient. To do so, we attract representations of multiple cardiac signals from the same patient to the corresponding PCP via supervised contrastive learning. We show that the utility of PCPs is multifold. First, they allow for the discovery of similar patients both within and across datasets. Second, such similarity can be leveraged in conjunction with a hypernetwork to generate patient-specific parameters, and in turn, patient-specific diagnoses. Third, we find that PCPs act as a compact substitute for the original dataset, allowing for dataset distillation.

1. INTRODUCTION

Modern medical research is arguably anchored around the "gold standard" of evidence provided by randomized control trials (RCTs) (Cartwright, 2007) . However, RCT-derived conclusions are population-based and fail to capture nuances at the individual patient level (Akobeng, 2005) . This is primarily due to the complex mosaic that characterizes a patient from demographics, to physiological state, and treatment outcomes. Similarly, despite the success of deep learning algorithms in automating clinical diagnoses (Galloway et al., 2019; Attia et al., 2019a; b; Ko et al., 2020) , network-generated predictions remain population-based and difficult to interpret. Such properties are a consequence of a network's failure to incorporate patient-specific structure during training or inference. As a result, physicians are reluctant to integrate such systems into their clinical workflow. In contrast to such reluctance, personalized medicine, the ability to deliver the right treatment to the right patient at the right time, is increasingly viewed as a critical component of medical diagnosis (Hamburg & Collins, 2010) . The medical diagnosis of cardiac signals such as the electrocardiogram (ECG) is of utmost importance in a clinical setting (Strouse et al., 1939) . For example, such signals, which convey information about potential abnormalities in a patent's heart, also known as cardiac arrhythmias, are used to guide medical treatment both within and beyond the cardiovascular department (Carter, 1950) . In this paper, we conceptually borrow insight from the field of personalized medicine in order to learn patient representations which allow for a high level of network interpretability. Such representations have several potential clinical applications. First, they allow clinicians to quantify the similarity of patients. By doing so, network-generated predictions for a pair of patients can be traced back to this similarity, and in turn, their corresponding ECG recordings. Allowing for this inspection of ECG recordings aligns well with the existing clinical workflow. An additional application of patient similarity is the exploration of previously unidentified patient relationships, those which may lead to the discovery of novel patient sub-cohorts. Such discoveries can lend insight into particular diseases and appropriate medical treatments. In contrast to existing patient representation learning methods (Zhu et al., 2016; Suo et al., 2017) , we concurrently optimize for a predictive task (cardiac arrhythmia classification), leverage patient similarity, and design a system specifically for 12-lead ECG signals. Contributions. Our contributions are the following: 1. Patient cardiac prototypes (PCPs) -we learn representations that efficiently summarize the cardiac state of a patient in an end-to-end manner via contrastive learning. 2. Patient similarity quantification -we show that, by measuring the Euclidean distance between PCPs and representations, we can identify similar patients across different datasets. 3. Dataset distillation -we show that PCPs can be used to train a network, in lieu of the original dataset, and maintain strong generalization performance.

2. RELATED WORK

Contrastive learning is a self-supervised method that encourages representations of instances with commonalities to be similar to one another. This is performed for each instance and its perturbed counterpart (Oord et al., 2018; Chen et al., 2020a; b; Grill et al., 2020) 

3.1. LEARNING PATIENT CARDIAC PROTOTYPES VIA CONTRASTIVE LEARNING

We assume the presence of a dataset, D = {x i , y i } N i=1 , comprising N ECG recordings, x, and cardiac arrhythmia labels, y, for a total of P tot patients. Typically, multiple recordings are associated with a single patient, p. This could be due to multiple recordings within the same hospital visit or multiple visits to a hospital. Therefore, each patient is associated with N/P tot recordings. We learn a feature extractor f θ : x ∈ R D -→ h ∈ R E , parameterized by θ, that maps a D-dimensional recording, x, to an E-dimensional representation, h. In the quest to learn patient-specific representations, we associate each patient, p, out of a total of P patients in the training set with a unique and learnable embedding, v ∈ R E , in a set of embeddings, V , where |V | = P N . Such embeddings are designed to be efficient descriptors of the cardiac state of a patient, and we thus refer to them as patient cardiac prototypes or PCPs. We propose to learn PCPs in an end-to-end manner via contrastive learning. More specifically, given an instance, x i , that belongs to a particular patient, k, we encourage its representation, h i = f θ (x i ), to be similar to the same patient's PCP, v k , and dissimilar to the remaining PCPs, v j , j = k. We quantify this similarity, s(h i , v k ), by using the cosine similarity with a temperature parameter, τ . The intuition is that each PCP, in being attracted to a diverse set of representations that belong to the same patient, should become invariant to insidious intra-patient differences. For a mini-batch of size, B, the contrastive loss is as follows.



and for different visual modalities (views) of the same instance(Tian et al., 2019). Such approaches are overly-reliant on the choice of perturbations and necessitate a large number of comparisons.Instead, Caron et al. (2020)   propose to learn cluster prototypes. Most similar to our work is that ofCheng et al. (2020)  and CLOCS(Kiyasseh et al., 2020)  which both show the benefit of encouraging patient-specific representations to be similar to one another. Although DROPS (Anonymous, 2021) leverages contrastive learning, it does so at the patient-attribute level. In contrast to existing methods, we learn patient-specific representations, PCPs, in an end-to-end manner Meta-learning designs learning paradigms that allow for the fast adaptation of networks. Prototypical Networks(Snell et al., 2017)  average representations to obtain class-specific prototypes. During inference, the similarity of representations to these prototypes determines the classification. Relational Networks (Sung et al., 2018) build on this idea by learning the similarity of representations to prototypes through a parametric function. Gidaris & Komodakis (2018) and Qiao et al. (2018) exploit hypernetworks (Ha et al., 2016) and propose to generate the parameters of the final linear layer of a network for few-shot learning on visual tasks. In contrast, during inference only, we compute the cosine similarity between representations and PCPs and use the latter as the input to a hypernetwork. Patient similarity aims at discovering relationships between patient data (Sharafoddini et al., 2017). To quantify these relationships, Pai & Bader (2018) and (Pai et al., 2019) propose Patient Similarity Networks for cancer survival classification. Exploiting electronic health record data, Zhu et al. (2016) use Word2Vec to learn patient representations, and Suo et al. (2017) propose to exploit patient similarity to guide the re-training of models, an approach which is computationally expensive. Instead, our work naturally learns PCPs as efficient descriptors of the cardiac state of a patient.

