QAID: QUESTION ANSWERING INSPIRED FEW-SHOT INTENT DETECTION

Abstract

Intent detection with semantically similar fine-grained intents is a challenging task. To address it, we reformulate intent detection as a question-answering retrieval task by treating utterances and intent names as questions and answers. To that end, we utilize a question-answering retrieval architecture and adopt a two stages training schema with batch contrastive loss. In the pre-training stage, we improve query representations through self-supervised training. Then, in the finetuning stage, we increase contextualized token-level similarity scores between queries and answers from the same intent. Our results on three few-shot intent detection benchmarks achieve state-of-the-art performance.

1. INTRODUCTION

Intent detection (ID) is the task of classifying an incoming user query to one class from a set of mutually-exclusive classes, a.k.a. intents (Wang et al., 2014; Schuurmans & Frasincar, 2019; Liu et al., 2019a) . This ability is a cornerstone for task-oriented dialogue systems as correctly identifying the user intent at the beginning of an interaction is crucial to its success. However, labeled data is required for training and manual annotation is costly. This calls for sample efficient methods, gaining high accuracy with minimal amounts of labeled data. Recent works tackling few-shot ID have relied on large-scale pre-trained language models, such as BERT (Devlin et al., 2018) . These works leverage task-adaptive training and focus on pre-training a model on a large open-domain dialogue corpus and fine-tuning it for ID classification (Mehri et al., 2020; Wu et al., 2020a; Casanueva et al., 2020; Zhang et al., 2021a) . Alternative approaches tried to learn query representation based on query-to-query matching (henceforth, Match-QQ systems) (Zhang et al., 2020; Mass et al., 2020; Mehri et al., 2021) The need to efficiently compare an incoming query to a large set of possible answers resides at the core of any question answering (QA) retrieval system (henceforth, Match-QA systems) (Karpukhin et al., 2020) . Recently, Khattab & Zaharia (2020) introduced ColBERT, which allows faster training and inference by replacing the cross-attention mechanism used by Match-QQ systems (Zhang et al., 2020; Mass et al., 2020; Nogueira & Cho, 2019) with a fast contextualized token-level similarity mechanism dubbed late interaction. In this work, we present a Question Answering inspired Intent Detection system, named QAID. We start by formulating the ID task as a question-answering retrieval task by treating the utterances and the intent names as queries and answers, respectively. This reformulation allows us to introduce valuable additional signal from the intent names. Then, we adapts the efficient architecture of Col-BERT while replacing its triplet function loss with batch contrastive loss which was proven to be more robust (Khosla et al., 2020) and performs well in various tasks (Gunel et al., 2021; Gao et al., 2021a ), including ID classification (Zhang et al., 2021b) . In contrast to ColBERT which compares a query to a pair of positive and negative documents, we also include queries as positive examples,



. Zhang et al. (2020); Mass et al. (2020) adopt pairwise-encoding systems with cross-attention to deploy K-Nearest-Neighbor (K-NN) (Fix & Hodges, 1989) classification schema where training queries are fully utilized for both training and inference stages. Nevertheless, those methods' downside is the processing time combined with the difficulty of scaling to large number of intents (Liu et al., 2021c).

