PINTO: FAITHFUL LANGUAGE REASONING USING PROMPT-GENERATED RATIONALES

Abstract

Neural language models (LMs) have achieved impressive results on various language-based reasoning tasks by utilizing latent knowledge encoded in their own pretrained parameters. To make this reasoning process more explicit, recent works retrieve a rationalizing LM's internal knowledge by training or prompting it to generate free-text rationales, which can be used to guide task predictions made by either the same LM or a separate reasoning LM. However, rationalizing LMs require expensive rationale annotation and/or computation, without any assurance that their generated rationales improve LM task performance or faithfully reflect LM decision-making. In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. First, PINTO maps out a suitable reasoning process for the task input by prompting a frozen rationalizing LM to generate a free-text rationale. Second, PINTO's reasoning LM is fine-tuned to solve the task using the generated rationale as context, while regularized to output less confident predictions when the rationale is perturbed. Across four datasets, we show that PINTO significantly improves the generalization ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets. Also, we find that PINTO's rationales are more faithful to its task predictions than those generated by competitive baselines. 1 

1. INTRODUCTION

Many language-based reasoning tasks require retrieving and reasoning over knowledge beyond the task input-e.g., commonsense reasoning and closed-book QA (Fig. 1 , left) (Talmor et al., 2018; Mihaylov et al., 2018) . Neural language models (LMs) have achieved impressive results on such tasks by utilizing latent knowledge encoded in their pretrained parameters (Raffel et al., 2020b; Brown et al., 2020) . Still, given LMs' black-box nature, it is unclear whether this knowledge is being used properly (Doshi-Velez & Kim, 2017; Lipton, 2018) . Previous studies have shown that LMs often learn spurious correlations from artifacts in downstream training data, thus limiting their generalizability (Branco et al., 2021; Geirhos et al., 2020; D'Amour et al., 2020) . With this in mind, a number of prior works aim to make LMs' reasoning processes more explicit by generating free-text rationales, which use LMs' internal knowledge to describe a reasoning process in natural language (Narang et al., 2020; Wei et al., 2022b; Marasović et al., 2022; Zelikman et al., 2022) . In the fine-tuned self-rationalizing paradigm, a single LM is fine-tuned to jointly generate the task output and rationale (Narang et al., 2020; Marasović et al., 2022; Zelikman et al., 2022) . In the prompted self-rationalizing paradigm, a single LM is instead frozen and prompted to jointly generate the task output and rationale, with the prompt consisting of a few input-output-rationale demonstrations (Wei et al., 2022b) . In the pipeline-rationalizing paradigm, a fine-tuned rationalizing LM first generates the rationale, which is then used as input for a separate fine-tuned reasoning LM to generate the output (Kumar & Talukdar, 2020; Rajani et al., 2019) . Published as a conference paper at ICLR 2023 Q Prompted LM (> 100B) R A + Q Fine-tuned LM (< 1B) Q Fine-tuned LM1 (< 1B) R Q Fine-tuned LM2 (< 1B) A Q Prompted LM (20B) R 1) Prompted Self-Rationalization 2) Finetuned Self-Rationalization 3) Pipeline Rationalization

4) PINTO

StrategyQA: CommonsenseQA: However, when considering generalization performance, reliability, and deployment costs, these existing paradigms all have key limitations. Fine-tuned self-rationalizing LMs often perform worse than non-rationalizing LMs, since their parameters are learned using two relatively dissimilar objectives, while also requiring expensive rationale annotations (Wiegreffe et al., 2020; Narang et al., 2020) . Prompted self-rationalizing LMs yield strong task performance and only need a few rationale demonstrations for the prompt, but are computationally prohibitive since they generally require very large-scale (i.e., over 100B parameters) LMs to work effectively (Wei et al., 2022a; b) . Besides requiring expensive rationale annotations, pipeline-rationalizing LMs' generated rationale forms a non-differentiable bottleneck between the two modules, which complicates end-to-end training and can hurt task performance (Wiegreffe et al., 2020; Hase et al., 2020) . Moreover, none of these paradigms has a mechanism for regularizing the rationale generation to faithfully reflect the reasoning process of the LM, without hurting task performance. Q Fine-tuned LM (< 1B) A R A + R A + or R A + or In this paper, we propose Prompted RatIonalizing with CouNTerfactual ReasOning ( PINTO), an LM pipeline that rationalizes via prompt-based learning, then reasons over the task input and rationale via counterfactual regularization. PINTO's rationalizing module is a medium-scale (i.e., 20B parameters) LM that contains vast latent knowledge obtained via pretraining (Black et al., 2022) . Though prohibitive to fine-tune, it is affordable for prompt-based learning. Given the task input and a minimal input-output-rationale demonstration prompt, the rationalizing module uses its internal knowledge to map out a suitable reasoning process for the task input by generating a free-text rationale. The rationalizing module is frozen during fine-tuning, which drastically reduces training costs and prevents it from exploiting spurious shortcuts in the downstream training data. PINTO's reasoning module is a small-scale (i.e., under 1B parameters) LM to which knowledge is transferred from the rationalizing module. The reasoning module is fine-tuned to solve the downstream reasoning task by using the generated rationale as context for the task input. Crucially, to help ensure that the reasoning module's behavior is dictated by the rationale (instead of by spurious shortcuts), the reasoning module is regularized to output less confident predictions when the rationale is noisily perturbed. To simulate shortcut reasoning, we consider two rationale perturbation strategies: token masking (i.e., rationale is ignored) and token replacement (i.e., rationale is misused). Across four question answering datasets (CSQA, StrategyQA, OpenBookQA, QASC), we show that PINTO significantly improves the reasoning LM's generalization, yielding higher performance on both in-distribution (ID) and out-of-distribution (OOD) test sets. Also, we find that rationales are utilized more faithfully by PINTO than by other methods, leading to better performance in lowresource settings. Furthermore, we show that PINTO's counterfactual regularization allows us to further improve task performance with refined rationales.

2. RATIONALE-BASED LANGUAGE REASONING

In this work, we study LMs' ability to reason about language using implicit knowledge. We consider a specific type of multi-choice question answering (QA) tasks where the required knowledge for



Code and data used in our experiments can be found at https://github.com/wangpf3/ pinto-faithful-language-reasoning.



Figure 1: Rationale-Based Language Reasoning. (a) Examples of reasoning tasks that require implicit knowledge beyond task inputs. (b) Comparison of existing paradigms for providing freetext rationales along with predictions.

