MULTI-SCALE ATTENTION FOR DIABETIC RETINOPA-THY DETECTION IN RETINAL FUNDUS PHOTOGRAPHS

Abstract

The diagnosis and/or grading of diabetic retinopathy (DR) in the retina fundus has traditionally been done by physicians using manual procedures. However, there has been a significant demand for automated eye diagnostic and grading systems due to the constant rise in the number of persons with diabetes over the past few decades. An excellent diagnostic and predictive value for treatment planning exists with automatic DR grading based on retinal fundus pictures. With the majority of the current automated DR grading systems, it is exceedingly challenging to capture significant features because of the minor changes between severity levels. This paper presents a deep learning-based method for automatically assessing diabetic retinopathy in retina fundus pictures. This paper presents a deep learning-based method for automatically assessing diabetic retinopathy in retina fundus pictures. In order to increase the discriminative ability of the retrieved features, we implement a multi-scale attention mechanism within a deep convolutional neural network architecture in this research. Additionally, we provide a brand-new loss function termed modified grading loss that enhances the training convergence of the suggested strategy by taking into account the distance between various grades of distinct DR categories. The suggested technique is trained, validated, and tested using a dataset about diabetic retinopathy that is openly available. The experimental findings are presented to illustrate how well the suggested strategy competes.

1. INTRODUCTION

Diabetic Retinopathy(DR) is a disorder caused by excessive blood sugar levels that damages the rear of the eye (retina). It is a long-term microvascular issue brought on by uncontrolled diabetes mellitus (DM) and is one of the most significant consequences of type 2 diabetes (T2DM) Ali et al. (2016) ; Naserrudin et al. (2022) . Diabetic retinopathy is classified into four types: no visible diabetic retinopathy (No DR), non-proliferative diabetic retinopathy (NPDR), proliferative diabetic retinopathy (PDR), and advanced diabetic eye disease (ADED). The multiple DR severity levels are broken down in Figure 1 . Physicians frequently use the international clinical DR severity scale developed by the American Academy of Ophthalmology (AAO) to classify patients as having non-proliferative diabetic retinopathy (NPDR), proliferative diabetic retinopathy (PDR), or maculopathy Ngah et al. (2020) . It defined NPDR as the presence of any of the following disorders with no signs of proliferative retinopathy: micro-aneurysms, intra retinal hemorrhage, venous beading, or intra retinal microvascular abnormalities (IRMAs). Neovascularization, vitreous or pre-retinal hemorrhage, or both, were used to describe PDR. Following that, fundus images were labeled as having no DR, NPDR, PDR, advanced diabetic retinopathy (ADED), cataract, maculopathy, or glaucoma suspicious Ngah et al. (2020); Mallika et al. (2011) . This classification assists in deciding when a referral is necessary, how frequently to monitor/screen patients, how to treat them, and other factors Saxena et al. (2020) . Manual eye screening for DR entails identifying this eye disorder through visual examination of the fundus, either through direct inspection (in-person dilated eye examinations) or through analysis of digital color fundus photographs of the retina. According to a number of studies (see e.g., Liesenfeld et al. (2000) ; Olson et al. (2003) ; Gangaputra et al. ( 2013)), fundus photography telemedicine has equivalent sensitivity and specificity as in-person screening for DR. Additionally, patients enjoy using it and find it to be less expensive. Manually analyzing this image is hard and time-consuming, and the problem gets worse in rural areas where access to skilled medical professionals is limited. Therefore, there is a great demand for screening programs that are more effective, repeatable, and comprehensive. This will eliminate access barriers, enable early diagnosis and treatment, and improve patient outcomes. (2009; 2013) . These works have a few shortcomings but also some effectiveness. First of all, since they employed hand-crafted features, the characteristics that are extracted from images are prone to noise, exposure, and artifacts, to name a few. Second, feature placement and segmentation cannot be successfully included into the complete DR detection system. Additionally, just diagnosing to establish the presence or absence of DR rather than diagnosing the severity of DR could not adequately address real-world difficulties or assist physicians with their practice. Recently, deep learning techniques have found use in the processing of medical images, using Alyoubi et al. 2018), this work embeds multi-scale attention mechanisms within the layers of a Res-Net56 architecture to improve classification accuracy for DR grading in fundus images. In contrast to earlier studies, the attention mechanisms used in this study are combined with a robust CNN architecture for automated DR grading, with a particular emphasis on the capacity of attention networks to automatically learn to focus on salient features at various phases of the feature extraction. Furthermore, we offer a novel loss function, dubbed modified grading loss, that improves



Figure 1: This is a figure. Schemes follow the same formatting. If there are multiple panels, they should be listed as: (a) Description of what is contained in the first panel. (b) Description of what is contained in the second panel. Figures should be placed in the main text near to the first time they are cited. A caption on a single line should be centered.Ophthalytics (2022)

Numerous research studies have been conducted in over the years to detect DR automatically, with a focus on feature extraction and binary-class prediction Baudoin et al. (1984); Frame et al. (1998); Niemeijer et al. (2009); Sinthanayothin et al. (2002); Niemeijer et al. (2005); Quellec et al. (2008); Abràmoff et al.

(2020) as an example. In recent years, many studies on automatic DR grading/detection, including Gulshan et al. (2016); Abràmoff et al. (2016); Li et al. (2019); Zhao et al. (2019); Anoop et al. (2022); Bilal et al. (2021); Al-Antary & Arafa (2021); Zhao et al. (2020), utilized deep learning-based approaches, notably convolutional neural networks. These works make the best use of the autonomous feature extraction and excellent discriminability of deep learning (CNNs). In general, most of these studies utilized CNN for either binary classification Gulshan et al. (2016); Anoop et al. (2022) or multi-class prediction Abràmoff et al. (2016); Li et al. (2019); Zhao et al. (2019); Bilal et al. (2021); Al-Antary & Arafa (2021); Zhao et al. (2020). However, the approach ignores global information and loses crucial aspects Bello et al. (2019) due to the down-sampling operators in CNN (i.e., convolution and pooling processes). Existing DR detection systems based on pure convolutional neural networks suffer from this loss of semantic information. This paper suggests to incorporate attention mechanisms within a deep convolutional neural architecture (Res-Net) for assessing diabetic retinopathy in retina fundus images. Motivated by the success of attention networks in machine translation Vaswani et al. (2017) and, more recently, computer vision tasks Li et al. (

