ATTENTION BASED JOINT LEARNING FOR SUPER-VISED ELECTROCARDIOGRAM ARRHYTHMIA DIFFER-ENTIATION WITH UNSUPERVISED ABNORMAL BEAT SEGMENTATION

Abstract

Deep learning has shown great promise in arrhythmia classification in electrocardiogram (ECG). Existing works, when classifying an ECG segment with multiple beats, do not identify the locations of the anomalies, which reduces clinical interpretability. On the other hand, segmenting abnormal beats by deep learning usually requires annotation for a large number of regular and irregular beats, which can be laborious, sometimes even challenging, with strong inter-observer variability between experts. In this work, we propose a method capable of not only differentiating arrhythmia but also segmenting the associated abnormal beats in the ECG segment. The only annotation used in the training is the type of abnormal beats and no segmentation labels are needed. Imitating human's perception of an ECG signal, the framework consists of a segmenter and classifier. The segmenter outputs an attention map, which aims to highlight the abnormal sections in the ECG by element-wise modulation. Afterwards, the signals are sent to a classifier for arrhythmia differentiation. Though the training data is only labeled to supervise the classifier, the segmenter and the classifier are trained in an end-to-end manner so that optimizing classification performance also adjusts how the abnormal beats are segmented. Validation of our method is conducted on two dataset. We observe that involving the unsupervised segmentation in fact boosts the classification performance. Meanwhile, a grade study performed by experts suggests that the segmenter also achieves satisfactory quality in identifying abnormal beats, which significantly enhances the interpretability of the classification results.

1. INTRODUCTION

Arrhythmia in electrocardiogram (ECG) is a reflection of heart conduction abnormality and occurs randomly among normal beats. Deep learning based methods have demonstrated strong power in classifying different types of arrhythmia. There are plenty of works on classifying a single beat, involving convolutional neural networks (CNN) (Acharya et al., 2017b; Zubair et al., 2016) , long short-term memory (LSTM) (Yildirim, 2018) , and generative adversarial networks (GAN) (Golany & Radinsky, 2019) . For these methods to work in clinical setting, however, a good segmenter is needed to accurately extract a single beat from an ECG segment, which may be hard when abnormal beats are present. Alternatively, other works (Acharya et al., 2017a; Hannun et al., 2019) try to directly identify the genres of arrhythmia present in an ECG segment. The limitation of these works is that they work as a black-box and fail to provide cardiologists with any clue on how the prediction is made such as the location of the associated abnormal beats. In terms of ECG segmentation, there are different tasks such as segmenting ECG records into beats or into P wave, QRS complexity, and T wave. On one hand, some existing works take advantage of signal processing techniques to locate some fiducial points of PQRST complex so that the ECG signals can be divided. For example, Pan-Tompkins algorithm (Pan & Tompkins, 1985) uses a combination of filters, squaring, and moving window integration to detect QRS complexity. The shortcomings of these methods are that handcraft selection of filter parameters and threshold is needed. More importantly, they are unable to distinguish abnormal heartbeats from normal ones. To address these issues, Moskalenko et al. (2019); Oh et al. (2019) deploy CNNs for automatic beat segmentation. However, the quality of these methods highly depends on the labels for fiducial points of ECG signals, the annotation process of which can be laborious and sometimes very hard. Besides, due to the high morphological variation of arrhythmia, strong variations exist even between annotations from experienced cardiologists. As such, unsupervised learning based approaches might be a better choice. Inspired by human's perception of ECG signals, our proposed framework firstly locates the abnormal beats in an ECG segment in the form of attention map and then does abnormal beats classification by focusing on these abnormal beats. Thus, the framework not only differentiates arrhythmia types but also identifies the location of the associated abnormal beats for better interpretability of the result. It is worth noting that, in our workflow, we only make use of annotation for the type of abnormality in each ECG segment without abnormal beat localization information during training, given the difficulty and tedious effort in obtaining the latter. We validate our methods on two datasets from different sources. The first one contains 508 12-lead ECG records of Premature Ventricular Contraction patients, which are categorized into different classes by the origin of premature contraction (e.g., left ventricle (LV) or right ventricle (RV)). For the other dataset, we process signals in the MIT-BIH Arrhythmia dataset into segments of standard length. This dataset includes various types of abnormal beats, and we select 2627 segments with PVC present and 356 segemnts with Atrial Premature Beat (APB) present. Experiments on both two dataset show quantitative evidence that introducing the segmentation of abnormal beats through an attention map, although unsupervised, can in fact benefit the arrhythmia classification performance as measured by accuracy, sensitivity, specificity, and area under Receiver Operating Characteristic (ROC) curve. At the same time, a grade study by experts qualitatively demonstrates our method's promising capability to segment abnormal beats among normal ones, which can provide useful insight into the classification result. Our code and dataset, which is the first for the challenging PVC differentiation problem, will be released to the public.

2. RELATED WORKS

Multitask learning There are many works devoted to training one deep learning models for multitasks rather than one specific task, like simultaneous segmentation and classification. (Yang et al., 2017) solves skin lesion segmentation and classification at the same time by utilizing similarities and differences across tasks. In the area of ECG signals, (Oh et al., 2019) modifies UNet to output the localization of r peaks and arrhythmia prediction simultaneously. What those two works have in common is that different tasks share certain layers in feature extraction. In contrast, our segmenter and classifier are independent models and there is no layer sharing between them. As can be seen in Figure 1 , we use attention maps as a bridge connecting the two models. (Mehta et al., 2018) segments different types of issues in breast biopsy images with a UNet and apply a discriminative map generated by a subbranch of the UNet to the segmentation result as input to a MLP for diagnosis. However, their segmentation and classification tasks are not trained end-to-end. (Zhou et al., 2019) proposes a method for collaborative learning of disease grading and lesion segmentation. They first perform a traditional semantic segmentation task with a small portion of annotated labels, and then they jointly train the segmenter and classifier for fine-tuning with an attention mechanism, which is applied on the latent features in the classification model, different from our method. Another difference is that for most existing multitask learning works, labels for each task are necessary, i.e., all tasks are supervised. Our method, on the other hand, only requires the labels of one task (classification), leading to a joint supervised/unsupervised scheme. Attention mechanism After firstly proposed for machine translation (Bahdanau et al., 2014) , attention model became a prevalent concept in deep learning and leads to improved performance in various tasks in natural language processing and computer visions. (Vaswani et al., 2017) exploits self-attention in their encoder-decoder architecture to draw dependency between input and output sentences. (Wang et al., 2017) builds a very deep network with attention modules which generates attention-aware features for image classification and (Oktay et al., 2018) integrates attention gates into U-Net (Ronneberger et al., 2015) to highlight latent channels informative for segmentation task.

