MOCA: COGNITIVE SCAFFOLDING FOR LANGUAGE MODELS IN CAUSAL AND MORAL JUDGMENT TASKS

Abstract

Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happened, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. These works have revealed a number of factors that systematically influence people's judgments, such as the presence of norms, and whether or not the protagonist in a scenario was aware of their action's potential consequences. Here, we investigate whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. We find that without any annotations, LLMs and human participants are not well aligned (17%-39% agreement). However, LLMs can accurately annotate what relevant factors are present in a scenario with simple expert-written instructions. We demonstrate how these annotations can be used to bring LLMs in closer alignment with people (36.3%-47.2% agreement). These results show how insights from cognitive science can help scaffold language models to more closely match human intuitions in challenging commonsense evaluation tasks.

1. INTRODUCTION

We live in a complex world where most things that happen are the result of a multitude of factors. How do humans handle this complexity? Cognitive scientists have proposed that we do so by organizing our understanding of the world into intuitive theories (Gerstenberg & Tenenbaum, 2017; Wellman & Gelman, 1992) . Accordingly, people have intuitive theories of the physical and social world with which we reason about how objects and agents interact with one another (Battaglia et al., 2013; Ullman et al., 2017; Gerstenberg et al., 2021; Baker et al., 2017) . Intuitive theories support a variety of cognitive functions that include making predictions about the future, drawing inferences about the past, and giving explanations of the present (Davis & Marcus, 2015; Lake et al., 2017) . Concepts related to causality and morality form key ingredients of people's physical and social theories. For instance, people's causal knowledge helps them parse the things that happen into actions, events, and their relationships (Davis & Marcus, 2015). People's moral knowledge helps them tell apart good from bad (Alicke et al., 2015) . Over the last few years, large language models (LLMs) have become increasingly successful in emulating certain aspects of human commonsense reasoning ranging from tasks such as physical reasoning (Tsividis et al., 2021 ), visual reasoning (Buch et al., 2022) , moral reasoning (Hendrycks et al., 2020), and text comprehension (Brown et al., 2020; Liu et al., 2019b; Bommasani et al., 2021) . Here, we investigate to what extent pretrained LLMs align with human intuitions about the role of objects and agents in text-based scenarios, using judgments about causality and morality as two case studies. We show how insights from cognitive science can help to align LLMs more closely with the intuitions of human participants. Prior work on alignment between LLMs and human intuitions usually collected evaluation datasets in two stages. In the first phase, participants write stories with open-ended instructions. In the second phase, another group of participants labels these participant-generated stories (e.g. Hendrycks et al., 2020) . The upside of this approach is the ease of obtaining a large number of examples in a short period of time. The downside of this approach is that the crowd-sourced stories are often not carefully written, and that they lack experimental control. Here, we take a different approach. Instead of

