ELICITATION INFERENCE OPTIMIZATION FOR MULTI-PRINCIPAL-AGENT ALIGNMENT

Abstract

In multi-principal-agent alignment scenarios including governance, markets, conflict resolution, and AI decision-making, it is infeasible to elicit every principal's view on all perspectives relevant to an agent's decisions. Elicitation inference optimization (EIO) aims to minimize the n elicitations needed to approximate N principal's views across K perspectives. In this work, we demonstrate an EIO approach where data efficiency (N K/n) increases with scale. We introduce STUMP: an elicitation inference model which integrates a large language model with a latent factor model to enable learning transfer across samples, contexts, and languages. We characterize STUMP's performance on a set of elicitation primitives from which scalable elicitation (sampling) protocols can be constructed. Building from these results, we design and demonstrate two elicitation protocols for STUMP where, surprisingly, data efficiency scales like O(n) in the number of elicitations n. In other words, the number of elicitations needed per principal remains constant even as the number of perspectives and principals grows. This makes it possible to approximate complex, high-dimensional preference signals spanning principal populations at scale.

1. INTRODUCTION

The principal-agent problem involves aligning agent decisions with principal interests. A challenge is creating situations where agent choices are sufficiently influenced by signals containing principal preferences. With a single principal, high-complexity preference signals can be elicited directly via open-ended interaction. Multi-principal-agent scenarios can involve large populations of principals and powerful agents such as: governments & citizens Giger and Lefkofridi (2014) ; Gabriel (2020), firms & customers Roberts and Grover (2012), peacekeepers & conflict parties United Nations (2012), existing AI systems & impacted populations Prabhakaran et al. (2022) , and potentially even transformative AI & humanity Russell et al. (2015) ; Christiano et al. (2017) . As the number of principals grows, and the domain of agent decisions becomes open-ended, directly eliciting the preference of all principals on all relevant perspectives becomes unfeasible [A.1] . As a result, lowercomplexity forms of elicitation like ballot voting (ie. for governments) and price signals (ie. for firms) are used to learn preferences. While clearly effective-a basis of democracy and the economy-these approaches drastically simplify real preferences. For example, they do not allow citizens (principals) to express what they would like a government (agent) to do, or why, only if they support predefined options. Elicitation inference optimization (EIO) aims to decrease the amount of direct elicitation needed to recover a preference signal (enabling the use of more complex, higher-dimension preferences; e.g. in natural language). Consider an N × K matrix Θ where rows correspond to N principals, columns correspond to K perspectives, and every element captures a principal-perspective relationship. The goal of elicitation inference optimization is to obtain a sufficient approximation of Θ with a minimal elicitation budget by directly sampling some elements and inferring the rest. Thus, EIO involves combining a) a sparse elicitation (sampling) protocol with, b) an elicitation inference model. Closed-ended surveys simplify EIO by constraining the set of relevant perspectives to a predefined set -typically, with K << N . Matrix sampling techniques elicit responses from each participant on a subset of perspectives selected randomly Shoemaker (1973) , heuristically Raghunathan and Grizzle (1995) , or dynamically such that inference accuracy is adaptively optimized 

Collective response systems (CRS) enable high-complexity signals by allowing the set of relevant

perspectives to be open-ended: neither limited in number nor requiring pre-definition Ovadya (2022). In a CRS, participants contribute open-ended perspectives in the context of a question, prompt, or conversation and respond to subsets of perspectives contributed by others -typically, yielding K ∼ O(N ). Inspired by preference models Thurstone (1927) and conjoint analysis Green and Srinivasan (1978) , one class of approaches elicits randomized pair-choice votes Konya and Slodov (2015) and uses hierarchical probit-like models for inference Salganik and Levy (2015) . Other approaches elicit agreement votes to learn absolute human-perspective relationships. Polis samples agreement votes semi-randomly, prioritizes votes that improve clustering, and uses mean imputation to support dimensionality reduction Small et al. (2021) . Additional work bridges these approaches by randomly sampling both pair-choice and agreement votes; inferring missing votes by modeling both vote types with a single utility matrix learned via regularized matrix completion Bilich et al. (2019) . Previous approaches to EIO for CRS have been limited in scope to single samples where the set of relevant perspectives all correspond to the same context (ie. the same question or prompt) Salganik and Levy (2015) ; Konya and Slodov (2015) ; Small et al. (2021); Bilich et al. (2019) . This means votes must be elicited from every person for every new context. As a result, to approximate some Θ spanning an arbitrary number of contexts, the number of direct elicitations needed grows proportional to the number of elements in Θ [A.2] -data efficiency does not increase with scale. However, previous approaches do not fully leverage all available data: learning from one context is not transferred to support inference in other contexts, and the information contained in the perspective text is not used at all. Can an EIO approach that better leverages available data, enable increasing data efficiency with scale? In this work we introduce an approach to EIO for CRS which becomes increasingly data efficient as the amount of data elicited grows. First, we introduce a novel elicitation inference model -STUMPwhich better leverages available data by integrating a pre-trained LLM with a latent factor model. Second, we characterize STUMP's performance across a range of elicitation primitives from which arbitrarily scalable elicitation (vote sampling) protocols can be constructed. Finally, building on these results, we design and demonstrate two scalable elicitation protocols where STUMP can infer with meaningful accuracy while data efficiency increases linearly with scale.

2. PROBLEM SETUP

Elicitation form. In a collective response system (CRS) participants respond to prompts with open-ended perspectives and vote on perspectives submitted by others. Let H be the set of N human participants, P be the set of K perspectives, and Θ be the corresponding N × K humanperspective matrix. Both agreement and pair-choice votes may be elicited on perspectives. In agreement exercise e a ij , participant i is asked if they agree with perspective j. In pair choice exercise e c ijk , participant i is asked if they prefer perspective j or k. We denote a set of exercise data E = {e a ij , e c ijk | i ∈ H; j, k ∈ P }. Elicitation inference involves training a model on available data and predicting all missing elements in a target Θ. We choose binary agreement as the human-perspective relationship for our target Θ. Inference performance is probed experimentally by training on a subset of exercises including both vote types E t , and computing prediction accuracy on a validation set of only agreement votes E v . Data efficiency in EIO can be quantified as the ratio of a) the number of votes contained in the target Θ and b) the number of votes directly elicited via a sampling protocol which a model needs to be trained on to meaningfully approximate Θ. We denote the data efficiency to achieve inference accuracy acc as: β acc = n(Θ) n(E t ) = n(H) × n(P ) n(E t ) = N K n t



Gonzalez and Eltinge (2008); Zhang et al. (2020). Inference models exploit the low-rank-ness of Θ via matrix factorization and other collaborative filtering techniques Zhang et al. (2020); Sengupta et al. (2021); Oliveira et al. (2021).

