EVALUATING AND INDUCING PERSONALITY IN PRE-TRAINED LANGUAGE MODELS

Abstract

Originated as a philosophical quest, personality discerns how individuals differ from each other in terms of thinking, feeling, and behaving. Toward building social machines that work with humans on a daily basis, we are motivated to ask: (1) Do existing Large Language Models (LLMs) possess personalities, akin to their human counterparts? (2) If so, how can we evaluate them? (3) Further, given this evaluation framework, how can we induce a certain personality in a fully controllable fashion? To tackle these three questions, we propose the Machine Personality Inventory (MPI) dataset for evaluating the machine personality; MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories. By evaluating models with MPI, we provide the first piece of evidence showing the existence of personality in LLMs. We further devise a CHAIN PROMPTING method to induce LLMs with a specific personality in a controllable manner, capable of producing diversified behaviors. We hope to shed light on future studies by adopting personality as the essential guide for various downstream tasks, building more human-like and in situ dialogue agents.

1. INTRODUCTION

The relatively stable tendencies in people's behaviors, cognition, and emotional patterns define an individual's personality; such a characteristic set of personal traits shapes the patterns of how people think, feel, and behave (Kazdin et al., 2000) , making human individuals unique (Weinberg and Gould, 2019) . For example, it is characters with vivid and diversified personalities that make Shakespeare's plays a masterpiece. In literature, the study of personality has been primarily driven by psychologists, who have developed a variety of personality theories to track traits of human behaviors. Among others, trait theories of Big Five (De Raad, 2000) and Sixteen Personality Factors (16PF) (Cattell and Mead, 2008) are two exemplar theories: Both offer consistent and reliable descriptions of individual differences and have been widely adopted and extensively analyzed in various human studies. Based on the trait theories, psychometric tests (e.g., NEO-PI-R (Costa Jr and McCrae, 2008) ) have shown high efficacy as a standard instrument for personality tests; these psychometric tests have revealed that human individual differences can be disentangled into sets of continuous factor dimensions. Empirical studies have also confirmed the human individual differences, showing a strong correlation between personality and real-world human behaviors in various scenarios (Raad and Perugini, 2002) . In stark contrast, it is unclear whether the existing Large Language Models (LLMs) possess any levels of personality as shown in humans. Specifically, with the preliminary success of LLMs (Weinberg and Gould, 2019) (e.g., BERT (Kenton and Toutanova, 2019) , GPT-3 (Brown et al., 2020 ), PaLM (Chowdhery et al., 2022) ) in achieving fluent communication, evidence suggests that they have learned human behaviors from training corpora and can be used for interacting with humans in various challenging applications, ranging from text generation to dialogue and conversational systems. Such powerful LLMs may ideally encode individual behavioral traits in a textual format (Goldberg, 1981) and satisfy our demands for perceivable and controllable personality. Taking together, with a goal to build a human-like machine (Lake et al., 2017; Rahwan et al., 2019; Zhu et al., 2020) , we set out to find out: Do state-of-the-art LLMs have their own personality? If so, can we induce a specific personality in these LLMs? To answer these questions, we introduce the Machine Personality Inventory (MPI)-a multiple-choice question-answering dataset on the basis of psychometric inventories-to evaluate LLMs' personality. While it is hard to dig into models' thinking and feeling like how we access human's personality, we focus on studying their personality-like behavior traits. We, therefore, borrow the concept of "personality" from psychology as human-like personality behaviorfoot_0 . Based on the Big Five trait theory, we build the MPI and disentangle the machine's personality into the following five key factors: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. To our best knowledge, ours is the first work that systematically evaluates modern LLMs' personality-like behavior using psychometric tests. By leveraging the MPI and its accompanying metrics, we evaluate the existence of LLMs' personality and the tendency in the trait continuum among the five personality factors. Our experiments show that the stability of LLMs' quantified behavior tendency is related to the number of parameters. As such, LLMs tend to possess a certain level of personality; in particular, GPT-3 exhibits human-level personality on MPI and matches the statistics observed in the human population. We further propose a CHAIN PROMPTING method to induce LLMs with a specific personality (see Fig. 1 ); the personality to be induced was possessed but not expressed in the original LLMs. Our CHAIN PROMPTING method generates inducing prompts for control by employing both psychological studies and knowledge from the LLM itself. By assessing the induced LLMs with both MPI and additional situational judgment tests, we show the validity of MPI and the efficacy of the CHAIN PROMPTING in inducing LLMs' personality.



See Appendix A.1 for more details.



Figure1: Evaluating and inducing personality in LLMs. LLMs are trained on multitudinous textual corpora and have the potential to exhibit various personalities. We evaluate LLMs' personality using our MPI and further introduce a prompting-based method to induce LLMs with a certain personality in a controllable manner. OCEAN refers to five key factors: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism.

acknowledgement

This work makes the following contributions:• We introduce the topic of machine (i.e., modern pre-trained LLMs) personality based on personality trait theories and psychometric inventories.• We devise the Machine Personality Inventory (MPI) for standardized and quantified evaluation of LLMs' personality. Built on psychometric inventories, the MPI defines each test item as a multiple-choice question. Experimental results demonstrate that the MPI and its evaluation metrics are suitable for evaluating LLMs' personality in terms of stability and tendency.• We validate the possibility of inducing different personalities from LLMs and propose the CHAIN PROMPTING to control five personality factors. On MPI evaluation and human situational judgment tests, the CHAIN PROMPTING method shows high efficacy in personality induction.

