KNOWLEDGE-DRIVEN ACTIVE LEARNING

Abstract

The deployment of Deep Learning (DL) models is still precluded in those contexts where the amount of supervised data is limited. To answer this issue, active learning strategies aim at minimizing the amount of labelled data required to train a DL model. Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary. These techniques are theoretically sound, but an understanding of the selected samples based on their content is not straightforward, further driving non-experts to consider DL as a black-box. For the first time, here we propose a different approach, taking into consideration common domain-knowledge and enabling non-expert users to train a model with fewer samples. In our Knowledge-driven Active Learning (KAL) framework, rule-based knowledge is converted into logic constraints and their violation is checked as a natural guide for sample selection. We show that even simple relationships among data and output classes offer a way to spot predictions for which the model need supervision. The proposed approach (i) outperforms many active learning strategies in terms of average F1 score, particularly in those contexts where domain knowledge is rich. Furthermore, we empirically demonstrate that (ii) KAL discovers data distribution lying far from the initial training data unlike uncertainty-based strategies, (iii) it ensures domain experts that the provided knowledge is respected by the model on test data, and (iv) it can be employed even when domain-knowledge is not available by coupling it with a XAI technique. Finally, we also show that KAL is also suitable for object recognition tasks and, its computational demand is low, unlike many recent active learning strategies.

1. INTRODUCTION

Deep Learning (DL) methods have achieved impressive results over the past decade in fields ranging from computer vision to text generation (LeCun et al., 2015) . However, most of these contributions relied on overly data-intensive models (e.g. Transformers), trained on huge amounts of data (Marcus, 2018) . With the advent of Big Data, sample collection does not represent an issue any more, but, nonetheless, in some contexts the number of supervised data is limited, and manual labelling can be expensive (Yu et al., 2015) . Therefore, a common situation is the unlabelled pool scenario (McCallumzy & Nigamy, 1998) , where many data are available, but only some are annotated. Historically, two strategies have been devised to tackle this situation: semi-supervised learning which exploit the unlabelled data to enrich feature representations (Zhu & Goldberg, 2009) , and active learning which select the smallest set of data to annotate to improve the most model performances (Settles, 2009) . The main assumption behind active learning strategies is that there exists a subset of samples that allows to train a model with a similar accuracy as when fed with all training data. Iteratively, the strategy indicates the optimal samples to be annotated from the unlabelled pool. This is generally done by ranking the unlabelled samples w.r.t. a given measure, usually on the model predictions (Settles, 2009; Netzer et al., 2011; Wang & Shang, 2014) , or on the input data distribution (Zhdanov, 2019; Santoro et al., 2017) and by selecting the samples associated to the highest rankings (Ren et al., 2021; Zhan et al., 2021) . While being theoretically sound, an understanding of the selected samples based on their content is not straightforward, in particular to non-ML experts. This issue becomes particularly relevant when considering that Deep Neural Networks are already seen as black box models (Gilpin et al., 2018; Das & Rad, 2020) On the contrary, we believe that neural models must be linked to Commonsense knowledge related to a given learning problem. Therefore, in this paper, we propose for the first time to exploit this symbolic knowledge in the selection process of an active 1

