DIALOGRAPH: INCORPORATING INTERPRETABLE STRATEGY-GRAPH NETWORKS INTO NEGOTIATION DIALOGUES

Abstract

To successfully negotiate a deal, it is not enough to communicate fluently: pragmatic planning of persuasive negotiation strategies is essential. While modern dialogue agents excel at generating fluent sentences, they still lack pragmatic grounding and cannot reason strategically. We present DIALOGRAPH, a negotiation system that incorporates pragmatic strategies in a negotiation dialogue using graph neural networks. DIALOGRAPH explicitly incorporates dependencies between sequences of strategies to enable improved and interpretable prediction of next optimal strategies, given the dialogue context. Our graph-based method outperforms prior state-of-the-art negotiation models both in the accuracy of strategy/dialogue act prediction and in the quality of downstream dialogue response generation. We qualitatively show further benefits of learned strategy-graphs in providing explicit associations between effective negotiation strategies over the course of the dialogue, leading to interpretable and strategic dialogues. 1 

1. INTRODUCTION

Negotiation is ubiquitous in human interaction, from e-commerce to the multi-billion dollar sales of companies. Learning how to negotiate effectively involves deep pragmatic understanding and planning the dialogue strategically (Thompson; Bazerman et al., 2000b; Pruitt, 2013) . Modern dialogue systems for collaborative tasks such as restaurant or flight reservations have made considerable progress by modeling the dialogue history and structure explicitly using the semantic content, like slot-value pairs (Larionov et al., 2018; Young, 2006) , or implicitly with encoder-decoder architectures (Sordoni et al., 2015; Li et al., 2016) . In such tasks, users communicate explicit intentions, enabling systems to map the utterances into specific intent slots (Li et al., 2020) . However, such mapping is less clear in complex non-collaborative tasks like negotiation (He et al., 2018) and persuasion (Wang et al., 2019) , where user intent and most effective strategies are hidden. Hence, along with the generated dialogue, the strategic choice of framing and the sequence of chosen strategies play a vital role, as depicted in Figure 1 . Indeed, prior work on negotiation dialogues has primarily focused on optimizing dialogue strategies-from highlevel task-specific strategies (Lewis et al., 2017) , to more specific task execution planning (He et al., 2018) , to fine-grained planning of linguistic outputs given strategic choices (Zhou et al., 2019) . These studies have confirmed that it is crucial to control for pragmatics of the dialogue to build effective negotiation systems. et al., 2020) to learn the associations between negotiation strategies, including conceptual and linguistic strategies and dialogue acts, and their relative importance in predicting the best sequence. We focus on buyer-seller negotiations in which two individuals negotiate on the price of an item through a chat interface, and we model the seller's behavior on the CraigslistBargain dataset (He et al., 2018) . 2 We demonstrate that DIALOGRAPH outperforms previous state-of-art methods on strategy prediction and downstream dialogue responses. This paper makes several contributions. First, we introduce a novel approach to model negotiation strategies and their dependencies as graph structures, via GNNs. Second, we incorporate these learned graphs into an end-to-end negotiation dialogue system and demonstrate that it consistently improves future-strategy prediction and downstream dialogue generation, leading to better negotiation deals (sale prices). Finally, we demonstrate how to interpret intermediate structures and learned sequences of strategies, opening-up the black-box of end-to-end strategic dialogue systems.

2. DIALOGRAPH

We introduce DIALOGRAPH, a modular end-to-end dialogue system, that incorporates GATs with hierarchical pooling to learn pragmatic dialogue strategies jointly with the dialogue history. DIALO-GRAPH is based on a hierarchical encoder-decoder model and consists of three main components: (1) hierarchical dialogue encoder, which learns a representation for each utterance and encodes its local context; (2) structure encoder for encoding sequences of negotiation strategies and dialogue acts; and (3) utterance decoder, which finally generates the output utterance. Formally, our dialogue input consists of a sequence of tuples, D = [(u 1 , da 1 , ST 1 ), (u 2 , da 2 , ST 2 ), ..., (u n , da n , ST n )] where u i is the utterance, da i is the coarse dialogue act and ST i = {st i,1 , st i,2 , . . . , st i,k } is the set of k fine-grained negotiation strategies for the utterance u i . 3 The dialogue context forms the input to (1) and the previous dialogue acts and negotiation strategies form the input to (2). The overall architecture is shown in Figure 2 . In what follows, we describe DIALOGRAPH in detail.

2.1. HIERARCHICAL DIALOGUE ENCODER

A dialogue context typically comprises of multiple dialogue utterances which are sequential in nature. We use hierarchical encoders for modeling such sequential dialogue contexts (Jiao et al., 2019) . To encode the utterance u t at time t, we use the pooled representations from BERT (Devlin et al., 2019) to obtain the corresponding utterance embedding e t . We then pass the utterance embeddings through a GRU to obtain the dialogue context encoding till time t, denoted by h U t .



Code, data and a demo system is released at https://github.com/rishabhjoshi/ DialoGraph_ICLR21 We focus on the seller's side followingZhou et al. (2019) who devised a set of strategies specific to maximizing the seller's success. Our proposed methodology, however, is general. For example, in an utterance Morning! My bro destroyed my old kit and I'm looking for a new pair for $10, the coarse dialogue act is Introduction, and the finer grained negotiation strategies include Proposing price, Being informal and Talking about family for building rapport.



Figure 1: Both options are equally plausible and fluent, but a response with effective pragmatic strategies leads to a better deal.

To model the explicit dialogue structure, prior work incorporated Hidden Markov Models (HMMs)(Zhai & Williams, 2014; Ritter et al., 2010), Finite State Transducers (FSTs)(Zhou et al., 2020)  and RNNs(He et al., 2018; Shi et al., 2019). While RNN-based models lack interpretability, HMMand FST-based approaches may lack expressivity. In this paper, we hypothesize that Graph Neural Networks (GNNs)(Wu et al., 2020)  can combine the benefits of interpretability and expressivity because of their effectiveness in encoding graph-structured data through message propagation. While being sufficiently expressive to model graph structures, GNNs also provide a natural means for interpretation via intermediate states(Xie & Lu, 2019; Pope et al., 2019).

