Renamer: A Transformer Architecture Invariant to Variable Renaming

Abstract

Many modeling tasks involve learning functions which are invariant to certain types of input transformations. In this work we consider a specific class of invariance: semantics-preserving variable renaming. We first show that Transformer networks trained on such tasks do not always mirror the invariance of the underlying function. In this work we propose Renamer, a Transformer architecture which is invariant to semantics-preserving variable renaming. Learning a neural surrogate of a large-scale CPU simulator using Renamer reduces simulation error by 24.79-52.8% compared to using a vanilla Transformer. Furthermore, the invariant network is not sensitive to variable renaming, and its output remains constant when evaluated on a variable renamed version of the test set. Finally, Renamer is more efficient to train: it matches the performance of the vanilla Transformer with 24.75-59.06% fewer training epochs.

1. Introduction

Modeling tasks often require reasoning about invariances of the task with respect to transformations on the inputs (Snavely, 2019; Bianchi et al., 2022) . A common approach to learning models that are invariant to a given transformation is to train models using data augmentations that exhibit the transformation (and corresponding invariance) under study (Shorten and Khoshgoftaar, 2019; Feng et al., 2021) . However, such approaches give no formal guarantees that the resulting models are always perfectly invariant to the transformation. Further, there is evidence that baking the inductive bias of the invariance into the model leads to accuracy improvements (LeCun and Bengio, 1995; Cohen and Welling, 2016; Lee et al., 2019; Keriven and Peyré, 2019; Wang et al., 2022) . Renaming invariance. We study renaming invariance, a particular type of invariance in sequence processing tasks which arises when reasoning about formal languages including programming languages (Chen et al., 2021; Alon et al., 2019; Renda et al., 2020), mathematics (Lample and Charton, 2020; Polu et al., 2022) , and synthetic grammars of natural languages (Marzoev et al., 2020; Berant and Liang, 2014) . Renaming invariance is invariance to bijective transformations of the input tokens that preserve the semantics of the input. An example of renaming invariance is in symbolic algebra programs, where variables can be bijectively renamed and the result of evaluating the expression doesn't change. Renaming sensitivity. General-purpose neural network architectures like LSTMs (Hochreiter and Schmidhuber, 1997) and Transformers (Vaswani et al., 2017) have shown impressive results on learning functions with renaming invariance (Alon et al., 2019; Renda et al., 2021) . However, these neural networks do not themselves demonstrate renaming invariance. For example, Alon et al. ( 2019) note sensitivity to "uninformative, obfuscated, or adversarial variable names". This sensitivity presents a challenge to deploying neural networks in this context, as their predictions are not robust to semantics-preserving input transformations. Our approach. We present an approach to enforcing renaming invariance in Transformers. The first key contribution that enables our approach is a formal definition of renaming invariance. We define renaming invariance as a property of functions which take sequences of tokens as input. We first define a view mapping as a mapping from an input token to its view, the semantic information about the token that is salient to the function. We then define a referent mapping as a mapping from an input token to its referent, the underlying entity to which it refers. A renaming invariant function is a function which generates the same output for any bijection of tokens that does not change tokens' views and bijectively renames tokens' referents -that is, salient semantic properties of tokens must not change, pairs of tokens which originally referred to the same underlying entity must both refer to the same permuted underlying entity, and pairs of tokens which originally referred to different underlying entities must still refer to different permuted underlying entities. We present two architecture changes that together enforce renaming invariance in Transformers. We refer to the resulting architecture as the Renamer. View anonymization. The first change, view anonymization, effectively replaces each token with a token that describes only its view. This enforces that the network is renaming invariant, because the network cannot make different predictions for different views. However, view anonymization alone reduces the representational capacity of the network, since the network can no longer distinguish tokens with different referents. Referent binding. To recover the representational capacity, we introduce a novel modification to the attention layer which we call the referent binding. Referent binding restricts the attention in the first layer of the Transformer, allowing tokens to only attend to other tokens with the same referent. This breaks the symmetry between tokens with the same view but different referents, restoring the representative capacity of Renamer, while maintaining that Renamer is renaming invariant. Contributions. We present the following contributions: • We introduce and formally characterize the renaming invariance problem. • We propose the two-step process of view anonymization and referent binding to enforce renaming invariance while maintaining representational power. We implement these in the Renamer, a renaming invariant Transformer model architecture. • We evaluate the Renamer on a renaming invariant x86 assembly processing task. Renamer reduces the error compared to a vanilla model by between 27.58% and 52.80%. By identifying and defining renaming invariance and proposing a Transformer model invariant to renaming invariance, our work takes a key step towards the goal of providing high-accuracy models with provable guarantees for tasks with input invariances.

2. Renaming Invariance in llvm-mca

This section presents a case study of renaming invariance in a sequence processing task. We first introduce x86 basic block throughput prediction and describe how it is a renaming-invariant task. We describe the views and referents present in x86 basic blocks for this task. We then describe renaming invariant permutations for this task, and show that the task's labels are invariant to these permutations, but are not invariant to other permutations. We finally demonstrate that Renamer generates accurate and renaming invariant predictions for this task, while baseline models are not renaming invariant and are therefore less accurate. 



Figure 1: Example of an x86-64 basic block and invariant and non invariant renaming. The registers may be renamed, as long as each register is renamed consistently.

