Renamer: A Transformer Architecture Invariant to Variable Renaming

Abstract

Many modeling tasks involve learning functions which are invariant to certain types of input transformations. In this work we consider a specific class of invariance: semantics-preserving variable renaming. We first show that Transformer networks trained on such tasks do not always mirror the invariance of the underlying function. In this work we propose Renamer, a Transformer architecture which is invariant to semantics-preserving variable renaming. Learning a neural surrogate of a large-scale CPU simulator using Renamer reduces simulation error by 24.79-52.8% compared to using a vanilla Transformer. Furthermore, the invariant network is not sensitive to variable renaming, and its output remains constant when evaluated on a variable renamed version of the test set. Finally, Renamer is more efficient to train: it matches the performance of the vanilla Transformer with 24.75-59.06% fewer training epochs.

1. Introduction

Modeling tasks often require reasoning about invariances of the task with respect to transformations on the inputs (Snavely, 2019; Bianchi et al., 2022) . A common approach to learning models that are invariant to a given transformation is to train models using data augmentations that exhibit the transformation (and corresponding invariance) under study (Shorten and Khoshgoftaar, 2019; Feng et al., 2021) . However, such approaches give no formal guarantees that the resulting models are always perfectly invariant to the transformation. Further, there is evidence that baking the inductive bias of the invariance into the model leads to accuracy improvements (LeCun and Bengio, 1995; Cohen and Welling, 2016; Lee et al., 2019; Keriven and Peyré, 2019; Wang et al., 2022) . Renaming invariance. We study renaming invariance, a particular type of invariance in sequence processing tasks which arises when reasoning about formal languages including programming languages (Chen et al., 2021; Alon et al., 2019; Renda et al., 2020) , mathematics (Lample and Charton, 2020; Polu et al., 2022) , and synthetic grammars of natural languages (Marzoev et al., 2020; Berant and Liang, 2014) . Renaming invariance is invariance to bijective transformations of the input tokens that preserve the semantics of the input. An example of renaming invariance is in symbolic algebra programs, where variables can be bijectively renamed and the result of evaluating the expression doesn't change. Renaming sensitivity. General-purpose neural network architectures like LSTMs (Hochreiter and Schmidhuber, 1997) and Transformers (Vaswani et al., 2017) have shown impressive results on learning functions with renaming invariance (Alon et al., 2019; Renda et al., 2021) . However, these neural networks do not themselves demonstrate renaming invariance. For example, Alon et al. ( 2019) note sensitivity to "uninformative, obfuscated, or adversarial variable names". This sensitivity presents a challenge to deploying neural networks in this context, as their predictions are not robust to semantics-preserving input transformations. Our approach. We present an approach to enforcing renaming invariance in Transformers. The first key contribution that enables our approach is a formal definition of renaming invariance. We define renaming invariance as a property of functions which take sequences of tokens as input. We first define a view mapping as a mapping from an input token to its view, the 1

