SEMI-PARAMETRIC PROMPT-GENERATION FOR MODEL EDITING

Abstract

Large Language models are used in various downstream tasks with great success. However, changing specific knowledge or beliefs of a model (a.k.a. model editing) efficiently to revise inaccurate predictions while not affecting all other cases is still challenging. Most previous methods compute gradients to change the model. These strategies generally work, paying the cost of high computing and memory complexity. The semi-parametric strategy has recently shown its effectiveness in alleviating the complexity via introducing memory to store the edits of knowledge. However, the memory does not have a proper mechanism to be utilized by a large pre-trained language model, limiting its generalizability to more complicated model editing scenarios. This work proposes a prompt generation mechanism to bridge the gap. Our method encodes the edits as prefix prompts for language models, then has the large pre-trained language model perform inference with the prompts. In other words, the model is edited by prompts without changing model parameters. Our method, SEPROG, significantly outperforms state-of-art methods by up to 20% on entailed edit benchmarks and provides up to 30% better performance over gradient-based methods on non-entailed benchmarks. These advantages are achieved with much less computation and memory consumption, proving prompt generation's great potential in model editing problems.

1. INTRODUCTION

Large pre-trained language models (Devlin et al., 2018; Lewis et al., 2019; Radford et al., 2019; Liu et al., 2019) have shown tremendous success on a wide variety of downstream tasks (Brown et al., 2020) such as language generation, fact-checking, summarization, etc. These successes are due to their ability to capture world-scaled knowledge by pre-training on massive corpora (Petroni et al., 2019) , as well as their effectiveness in fine-tuning to adapt to arbitrary downstream tasks. However, modifying the underlying beliefs of large language models with a desired degree of control is still an open problem (Hase et al., 2021) . The need to evolve the model's beliefs may range from reflecting simple factual changes about the world (such as changing the capital of a country) to updating entailed relationships between knowledge entities (such as deducing properties of a new species based on its biological taxonomy). The problem setting of Model Editing (Mitchell et al., 2021; Sinitsin et al., 2020) formulates the challenge well. Specifically, given a small sample of edit data (e.g., description of factual changes), the goal is to make the model provide updated predictions for inputs that are semantically related to the edit data (i.e., in-scope data), while retaining the same beliefs for inputs outside the scope of edit data (i.e., out-scope data). Previous model editing strategies learn a gradient-based optimizer (Mitchell et al., 2021) or a model that can quickly adopt the edits via gradient descents (Sinitsin et al., 2020; Hase et al., 2021) . These methods achieve significant success with few edits, but their accuracy falls quickly with a larger amount of edits. The interference between edits may be a cause, but controlling the beliefs through the space of model parameters introduces unmanageable complexity. One obvious side-effect is scalability. Mitchell et al. (2021) has shown it is non-trivial to generate the gradients by neural networks or impractical to compute the gradient of gradients (i.e., hypergradients) for learning the fast adapting models beyond billions of parameters. A recent work, SERAC (Mitchell et al., 2022) , tackles model editing using a semi-parametric approach with an explicit memory to store the edit data. SERAC first classifies whether an input is in-scope or out-scope. If a case is out-scope (i.e., unrelated to the edit data), SERAC uses the base model for predictions. In the in-scope case, SERAC uses an additional model (named counterfactual model in the original paper) for predictions. The counterfactual model extracts the most similar edit in the memory, then uses the edit together with the input for generating predictions. SERAC updates only its memory when there are new edits; therefore, it avoids maneuvering the model parameter space. This strategy is, therefore, scalable and agnostic to the base model. However, the two branches in SERAC make predictions independently, isolating the counterfactual model from leveraging the knowledge of the base model, which was trained with massive corpora. In other words, the design has no knowledge sharing between the branches, making an entailed prediction less likely for the edit-related (in-scope) inputs. This limitation is convoluted with the challenge of training a generalizable classifier to distinguish in-/out-scope, causing the strategy to stumble on harder cases (examined in later sections). This work proposes SEPROG (Semi-parametric Prompt-Generation) to retain the good parts of the above strategies while minimizing the limitations: (1) scalable base model size; (2) scalable edit dataset size; (3) good generalizability to hard edits (4) low cost to train and inference. Inspired by the non-parametric neural models (Graves et al., 2014; Garnelo et al., 2018a) , which stores training data (or edits) rather than model parameters, our semi-parametric approach stores latent representations of the edit data as well as leverages the information from pre-trained base model. Our approach also builds on recent advances in prompt-tuning methods (Li & Liang, 2021; Lester et al., 2021) , which convert task descriptions (or the edits) to real-valued prefix for language inputs. These two ideas lead to an end-to-end solution to change the belief of a model with encoder-decoder structures (Lewis et al., 2019; Raffel et al., 2020) . Our method generates edit dataset-specific embeddings by leveraging the encoder, then injects the embeddings as the prefix of inputs to the decoder (Figure 1 ). As a result, SEPROG does not change the base model after deployment (the advantage of SERAC) while still using the base model to generate predictions on both in-scope and out-scope inputs. The inference of in-scope inputs therefore can leverage the world knowledge of the base model to achieve better generalization on hard edits (the advantage of gradient-based methods). Our contributions are summarized as follows: • We propose a novel model editing method that leverages both prompting and semi-parametric strategies. • We examined 6 model editing strategies on 3 evaluation aspects: (1) small/large amount of edit data; (2) easy/hard edits; (3) cost of training/inference. This comprehensive comparison shows that the dominant strategy is yet to come, but SEPROG's well-balanced advantages in all 3 aspects make it the best frontier in the current solution space.

2. MODEL EDITING PROBLEM

Let M base be a large pre-trained language model trained on a task T with dataset D T = {[x i , y i ]} |D T | i=1 . The goal of model editing is to update the beliefs of M base with a set of K edit descriptors Z e = {z e i } K i=1 . In the case of question-answering, the edit descriptor could be a question-answer pair z e i = [x e i , y e i ]. The edited model will need to provide updated answers for questions that are semantically related to data in Z e while not changing its predictions for inputs unrelated to Z e . 



, we are given a model M base : R n → R m trained on dataset D T . Let I(z e ) = {[x, y]} be the set of inputs whose predictions are affected by information in edit descriptor z e , i.e., labels y is different for datapoints in I(z e ) due to updated information from z e . (Note that z e ∈ I(z e )). Let O(z e ) ⊂ D T be set of inputs whose predictions are not affected by z e . Then we define I(Z e ) = z e I(z e ) as the in-scope data and O(Z e ) = z e O(z e ) as out-scope data. The goal of model-editing is to derive an edited model M edit from M base such that: 1. M edit outputs the revised predictions for in-scope data ([x ′ , y ′ ] ∼ I(Z e )): arg max y p(y|x ′ ; M edit ) = y ′ 2. The changes in the output probability distribution should be small for out-scope inputs. In other words, the KL-Divergence between M base and M edit 's predictions is small: min M edit KL(p(•|x, M base )||p(•|x, M edit )).

