CHEMALGEBRA: ALGEBRAIC REASONING ON CHEMICAL REACTIONS

Abstract

While showing impressive performance on various kinds of learning tasks, it is yet unclear whether deep learning models have the ability to robustly tackle reasoning tasks. Measuring the robustness of reasoning in machine learning models is challenging as one needs to provide a task that cannot be easily shortcut by exploiting spurious statistical correlations in the data, while operating on complex objects and constraints. To address this issue, we propose CHEMALGEBRA, a benchmark for measuring the reasoning capabilities of deep learning models through the prediction of stoichiometrically-balanced chemical reactions. CHEMALGEBRA requires manipulating sets of complex discrete objects -molecules represented as formulas or graphs -under algebraic constraints such as the mass preservation principle. We believe that CHEMALGEBRA can serve as a useful test bed for the next generation of machine reasoning models and as a promoter of their development.

1. INTRODUCTION

Deep learning models, and Transformer architectures in particular, currently achieve the state-of-theart for a number of application domains such as natural language and audio processing, computer vision, and computational chemistry (Lin et al., 2021; Khan et al., 2021; Braşoveanu & Andonie, 2020) . Given enough data and enough parameters to fit, these models are able to learn intricate correlations (Brown et al., 2020) . This impressive performance on machine learning tasks suggests that they could be suitable candidates for machine reasoning tasks (Helwe et al., 2021) . Reasoning is the ability to manipulate a knowledge representation into a form that is more suitable to solve a new problem (Bottou, 2014; Garcez et al., 2019) . In particular, algebraic reasoning includes a set of reasoning manipulations such as abstraction, arithmetic operations, and systematic composition over complex objects. Algebraic reasoning is related to the ability of a learning system to perform systematic generalization (Marcus, 2003; Bahdanau et al., 2018; Sinha et al., 2019) , i.e. to robustly make predictions beyond the data distribution it has been trained on. This is inherently more challenging than discovering correlations from data, as it requires the learning system to actually capture the true underlying mechanism for the specific task (Pearl, 2009; Marcus, 2018) . Lately, much attention has been put on training Transformers to learn how to reason (Helwe et al., 2021; Al-Negheimish et al., 2021; Storks et al., 2019; Gontier et al., 2020) . This is usually done by embedding an algebraic reasoning problem in a natural language formulation. Natural language, despite its flexibility, is imprecise and prone to shortcuts (Geirhos et al., 2020) . As a result, it is often difficult to determine whether the models' performance on reasoning tasks is genuine or it is merely due to the exploitation of spurious statistical correlations in the data. Several works in this direction suggest (Agrawal et al., 2016; Jia & Liang, 2017; Helwe et al., 2021) the latter is probably the case. In order to effectively assess the reasoning capabilities of Transformers, we need to accurately design tasks that i) operate on complex objects ii) require algebraic reasoning to be carried out and iii) cannot be shortcut by exploiting latent correlations in the data. We identify chemical reaction prediction as a suitable candidate for these desiderata. First, chemical reactions can be naturally interpreted as transformations over bags of complex objects: reactant molecules are turned into product molecules by manipulating their graph structures while abiding certain constraints such as the law of mass conservation. Second, these transformations can be analysed as algebraic operations over (sub-)graphs (e.g., by observing bonds forming and dissolving (Bradshaw et al., 2019) ), and balancing them to 1

