UNCERTAINTY ESTIMATION IN AUTOREGRESSIVE STRUCTURED PREDICTION

Abstract

Uncertainty estimation is important for ensuring safety and robustness of AI systems. While most research in the area has focused on unstructured prediction tasks, limited work has investigated general uncertainty estimation approaches for structured prediction. Thus, this work aims to investigate uncertainty estimation for autoregressive structured prediction tasks within a single unified and interpretable probabilistic ensemble-based framework. We consider uncertainty estimation for sequence data at the token-level and complete sequence-level; interpretations for, and applications of, various measures of uncertainty; and discuss both the theoretical and practical challenges associated with obtaining them. This work also provides baselines for token-level and sequence-level error detection, and sequencelevel out-of-domain input detection on the WMT'14 English-French and WMT'17 English-German translation and LibriSpeech speech recognition datasets.

1. INTRODUCTION

Neural Networks (NNs) have become the dominant approach in numerous applications (Simonyan & Zisserman, 2015; Mikolov et al., 2013; 2010; Bahdanau et al., 2015; Vaswani et al., 2017; Hinton et al., 2012) and are being widely deployed in production. As a consequence, predictive uncertainty estimation is becoming an increasingly important research area, as it enables improved safety in automated decision making (Amodei et al., 2016) . Important advancements have been the definition of baseline tasks and metrics (Hendrycks & Gimpel, 2016) and the development of ensemble approaches, such as Monte-Carlo Dropout (Gal & Ghahramani, 2016) and Deep Ensembles (Lakshminarayanan et al., 2017)foot_0 . Ensemble-based uncertainty estimates have been successfully applied to detecting misclassifications, out-of-distribution inputs and adversarial attacks (Carlini & Wagner, 2017; Smith & Gal, 2018; Malinin & Gales, 2019) and to active learning (Kirsch et al., 2019) . Crucially, they allow total uncertainty to be decomposed into data uncertainty, the intrinsic uncertainty associated with the task, and knowledge uncertainty, which is the model's uncertainty in the prediction due to a lack of understanding of the data (Malinin, 2019) 2 . Estimates of knowledge uncertainty are particularly useful for detecting anomalous and unfamiliar inputs (Kirsch et al., 2019; Smith & Gal, 2018; Malinin & Gales, 2019; Malinin, 2019) . Despite recent advances, most work on uncertainty estimation has focused on unstructured tasks, such as image classification. Meanwhile, uncertainty estimation within a general, unsupervised, probabilistically interpretable ensemble-based framework for structured prediction tasks, such as language modelling, machine translation (MT) and speech recognition (ASR), has received little attention. Previous work has examined bespoke supervised confidence estimation techniques for each task separately (Evermann & Woodland, 2000; Liao & Gales, 2007; Ragni et al., 2018; Chen et al., 2017; Koehn, 2009; Kumar & Sarawagi, 2019) which construct an "error-detection" model on top of the original ASR/NMT system. While useful, these approaches suffer from a range of limitations. Firstly, they require a token-level supervision, typically obtained via minimum edit-distance alignment to a ground-truth transcription (ASR) or translation (NMT), which can itself by noisy. Secondly, such token-level supervision is generally inappropriate for translation, as it doesn't account for the validity of re-arrangements. Thirdly, we are unable to determine whether the error is due to knowledge or



An in-depth comparison of ensemble methods was conducted in(Ashukha et al., 2020; Ovadia et al., 2019) 2 Data and Knowledge Uncertainty are sometimes also called Aleatoric and Epistemic uncertainty.1

