TOWARDS A RELIABLE AND ROBUST DIALOGUE SYS-TEM FOR MEDICAL AUTOMATIC DIAGNOSIS

Abstract

Dialogue system for medical automatic diagnosis (DSMAD) aims to learn an agent that mimics the behavior of a human doctor, i.e. inquiring symptoms and informing diseases. Since DSMAD has been formulated as a Markov decisionmaking process, many studies apply reinforcement learning methods to solve it. Unfortunately, existing works solely rely on simple diagnostic accuracy to justify the effectiveness of their DSMAD agents while ignoring the medical rationality of the inquiring process. From the perspective of medical application, it's critical to develop an agent that is able to produce reliable and convincing diagnosing processes and also is robust in making diagnosis facing noisy interaction with patients. To this end, we propose a novel DSMAD agent, INS-DS (Introspective Diagnosis System) comprising of two separate yet cooperative modules, i.e., an inquiry module for proposing symptom-inquiries and an introspective module for deciding when to inform a disease. INS-DS is inspired by the introspective decision-making process of human, where the inquiry module first proposes the most valuable symptom inquiry, then the introspective module intervenes the potential responses of this inquiry and decides to inquire only if the diagnoses of these interventions vary. We also propose two evaluation metrics to validate the reliability and robustness of DSMAD methods. Extensive experimental results demonstrate that INS-DS achieves the new state-of-the-art under various experimental settings and possesses the advantages of reliability and robustness compared to other methods.

1. INTRODUCTION

Dialogue system for medical automatic diagnosis (DSMAD) aims to learn an agent to collect patient's information and make preliminary diagnosis in an interactive manner like a human doctor. This task increasingly grasps the attention of researchers because of its huge industrial potential (Tang et al., 2016) . Similar to other task-oriented dialogue tasks (Lipton et al., 2018; Wen et al.; Yan et al., 2017; Lowe et al., 2015) , DSMAD is composed of a sequence of dialogue-based interactions between the patient and the agent, which can be formulated as a Markov decision process and resolved by reinforcement learning (RL) (Mnih et al., 2015; Van Hasselt et al., 2016) . Although several frameworks have been proposed (Xu et al., 2019; Wei et al., 2018; Peng et al., 2018; Tang et al., 2016) , DSMAD is still far from being applicable, because these works only evaluate the agent based on the accuracy of diagnosis, but ignoring the importance of robustness and reliability for practical medical applications. The two major shortcomings of the current DSMAD methods are summarized below. Unreliable symptom-inquiry and disease-diagnosis. It is reasonable to measure DSMAD by diagnosis accuracy since the accuracy is the ultimate goal of the task. However, in unilateral pursuit of high accuracy, DSMAD agent pays less attention to the rationale of the diagnosis process, reducing the trust of users. For example, a DSMAD agent might jump into a conclusion without inquiring about any symptom. As long as the diagnosis is correct, such an agent will still get a positive reward. In this sense, the correctness of diagnoses is not sufficient to reflect the performance of DSMAD, and might lead the agent to make a hasty diagnosis without interactions. Moreover, DSMAD should learn to make consistent disease-diagnoses according to the symptom-disease relation in the training data, insensitive to the noise happened during training. Sensitive to small disturbance. Almost all of the current DSMAD methods combine the operation of symptom-inquiry and disease-diagnosis together and allow models to make the sequential deci-

