Q: DO LARGE LANGUAGE MODELS UNDERSTAND IMPLICATURE? A: DO PIGS FLY?

Abstract

Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context. Humans interpret language using beliefs and prior knowledge about the world. For example, we intuitively understand the response "I wore gloves" to the question "Did you leave fingerprints?" as meaning "No". To investigate whether LLMs have the ability to make this type of inference, known as an implicature, we design a simple task and evaluate widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference (yes or no), most perform close to random. Models adapted to be "aligned with human intent" perform much better, but still show a significant gap with human performance. We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.

1. INTRODUCTION

User: "Have you seen my phone?" InstructGPT: "Yes, I have seen your phone." InstructGPT's responsefoot_0 is a perfectly fine answer to the question, but a human might answer differently. They might respond "it's in your bag," bypassing the obvious follow-up question ("where is it?"). Giving such a helpful and efficient answer is an example of pragmatic language usage that goes beyond the semantic meaning of utterances. Meaning is not only determined by a combination of words, but also context, beliefs, and social institutions (Wittgenstein, 1953; Grice, 1975; Huang, 2017) . Consider another exchange where Esther asks her friend Juan "Can you come to my party on Friday?" and Juan responds "I have to work.". We resolve Juan's response into a decline by using the contextual commonsense knowledge that having to work on a Friday night precludes attendance. Both these exchanges contain an implicature-utterances that convey something other than their literal meaningfoot_1 . Implicatures illustrate how context contributes to meaning; distinguishing writing and speaking from communicating (Green, 1996) . We cannot fully understand utterances without understanding their implications, nor can a computational model. Indeed, the term "communication" presupposes the speaker's implications are understood by the addressee. Although communication encompasses much more than implicatures, such as assertives and other illocutionary acts, we view implicature understanding as a necessary condition for communicating with humans. Being able to resolve seemingly completely novel implicatures and-more broadly-engage in pragmatic understanding constitutes an essential and ubiquitous aspect of our every day usage of language. Large language models (LLMs) have demonstrated remarkable ability on a variety of downstream tasks such as planning (Huang et al., 2022a ), commonsense reasoning (Kojima et al., 2022 ), information retrieval (Lewis et al., 2020; Kim et al., 2022) and code completion (Austin et al., 2021; Biderman & Raff, 2022) , to name just a few. When finetuned with human feedback, LLMs obtain higher ratings on desiderata like helpfulness (Ouyang et al., 2022; Bai et al., 2022) , and are proposed as conversational agents (Thoppilan et al., 2022) . Despite the widespread use and deploy-



Appendix A contains details on how this answer was obtained from InstructGPT-3. In Appendix B we present a comprehensive introduction to implicature. 1

