CHEMALGEBRA: ALGEBRAIC REASONING ON CHEMICAL REACTIONS

Abstract

While showing impressive performance on various kinds of learning tasks, it is yet unclear whether deep learning models have the ability to robustly tackle reasoning tasks. Measuring the robustness of reasoning in machine learning models is challenging as one needs to provide a task that cannot be easily shortcut by exploiting spurious statistical correlations in the data, while operating on complex objects and constraints. To address this issue, we propose CHEMALGEBRA, a benchmark for measuring the reasoning capabilities of deep learning models through the prediction of stoichiometrically-balanced chemical reactions. CHEMALGEBRA requires manipulating sets of complex discrete objects -molecules represented as formulas or graphs -under algebraic constraints such as the mass preservation principle. We believe that CHEMALGEBRA can serve as a useful test bed for the next generation of machine reasoning models and as a promoter of their development.

1. INTRODUCTION

Deep learning models, and Transformer architectures in particular, currently achieve the state-of-theart for a number of application domains such as natural language and audio processing, computer vision, and computational chemistry (Lin et al., 2021; Khan et al., 2021; Braşoveanu & Andonie, 2020) . Given enough data and enough parameters to fit, these models are able to learn intricate correlations (Brown et al., 2020) . This impressive performance on machine learning tasks suggests that they could be suitable candidates for machine reasoning tasks (Helwe et al., 2021) . Reasoning is the ability to manipulate a knowledge representation into a form that is more suitable to solve a new problem (Bottou, 2014; Garcez et al., 2019) . In particular, algebraic reasoning includes a set of reasoning manipulations such as abstraction, arithmetic operations, and systematic composition over complex objects. Algebraic reasoning is related to the ability of a learning system to perform systematic generalization (Marcus, 2003; Bahdanau et al., 2018; Sinha et al., 2019) , i.e. to robustly make predictions beyond the data distribution it has been trained on. This is inherently more challenging than discovering correlations from data, as it requires the learning system to actually capture the true underlying mechanism for the specific task (Pearl, 2009; Marcus, 2018) . Lately, much attention has been put on training Transformers to learn how to reason (Helwe et al., 2021; Al-Negheimish et al., 2021; Storks et al., 2019; Gontier et al., 2020) . This is usually done by embedding an algebraic reasoning problem in a natural language formulation. Natural language, despite its flexibility, is imprecise and prone to shortcuts (Geirhos et al., 2020) . As a result, it is often difficult to determine whether the models' performance on reasoning tasks is genuine or it is merely due to the exploitation of spurious statistical correlations in the data. Several works in this direction suggest (Agrawal et al., 2016; Jia & Liang, 2017; Helwe et al., 2021) the latter is probably the case. In order to effectively assess the reasoning capabilities of Transformers, we need to accurately design tasks that i) operate on complex objects ii) require algebraic reasoning to be carried out and iii) cannot be shortcut by exploiting latent correlations in the data. We identify chemical reaction prediction as a suitable candidate for these desiderata. First, chemical reactions can be naturally interpreted as transformations over bags of complex objects: reactant molecules are turned into product molecules by manipulating their graph structures while abiding certain constraints such as the law of mass conservation. Second, these transformations can be analysed as algebraic operations over (sub-)graphs (e.g., by observing bonds forming and dissolving (Bradshaw et al., 2019) ), and balancing them to preserve mass conservation can be formalised as solving a linear system of equations, as we will show in Section 2. Third, the language of chemical molecules and reactions is much less ambiguous than natural language and by controlling the stoichiometric coefficients, i.e., the molecule multiplicities, at training and test time we can more precisely measure systematic generalization. Lastly, Transformers already excel at learning reaction predictions (Tetko et al., 2020; Irwin et al., 2022) .foot_0 Therefore, we think this can be a solid test bed to measure the current gap between learning and reasoning capabilities of modern deep learning models. The main contributions of this paper are the following: 1. We cast chemical reaction prediction as a reasoning task where the learner has not only to predict a set of products but also correct stoichiometric coefficient variations (Section 2). 2. We evaluate the current state-of-the-art Transformers for chemical reaction predictions, showing that they fail to robustly generalise when reasoning on simple variants of the chemical reaction dataset they have been trained on (Section 3). 3. We introduce CHEMALGEBRA as a novel challenging benchmark for machine reasoning, in which we can more precisely measure the ability of deep learning models to algebraically reason over bags of graphs in in-, cross-and out-of-distribution settings (Section 4).

2. PREDICTING CHEMICAL REACTIONS AS ALGEBRAIC REASONING

To illustrate our point, let us consider the Sabatier reaction: it yields methane (CH 4 ) and water (H 2 O) out of hydrogen (H 2 ) and carbon dioxide (CO 2 ), in the presence of nickel (Ni) as a catalyst. In chemical formulas: 1CO 2 + 4H 2 Ni 1CH 4 + 2H 2 O (1) where formulas encode complex graph structures where atoms are nodes and chemical bonds edges: A reaction prediction learning task hence consists of outputting a bag of graphs, the products (right hand side), given the bag of graphs consisting of reactants (left hand side) and reagents (i.e. the catalysts, over the reaction's arrow). The multiplicities of the molecules, also called their stoichiometric coefficients, express the fractional proportions of reactants to yield a certain proportion of products. For example, one needs a 4:1 ratio of hydrogen molecules and carbon dioxide to produce a 1:2 ratio of methane and water. A reaction is (mass) balanced when its stoichiometric coeffients are well placed such that the sum of the number of atoms for every element across products shall be the same of that across reactants, i.e., it satisfies the principle of mass conservation (Whitaker, 1975) . Unbalanced reactions, on the other hand, would be chemically implausible. This constraint over atoms of the molecules underpins the true chemical mechanism behind reactions: bonds between atoms break and form under certain conditions but atoms do not change. In reasoning terms, this is a symbol-manipulating process where bags of graphs are deconstructed into other bags of graphs. A machine reasoning system that would have learned this true chemical mechanism, would be able to perfectly solve the chemical reaction prediction task for all balanced reactions and for all possible variations of stoichiometric coefficients. As humans, we can balance fairly complex chemical reactions quite easily.foot_1 For machines, this process can be formalised as finding a solution of a potentially undetermined system of linear equations. For example, we can write the Sabatier reaction as: r 1 • CO 2 + r 2 • H 2 + r 3 • Ni = p 1 • CH 4 + p 2 • H 2 O + p 3 • Ni (2)



An extended overview of the related works in chemical reaction prediction is given in Appendix A. We learn to do it from very few examples, e.g. a handful of reactions taken from chemistry textbooks in high school. Without following an explicit algorithm, we can usually perform balancing in an intuitive way, by leveraging our quick arithmetic skills to count the atoms for an element and match the numbers on both sides of the equation, iteratively changing the stoichiometric coefficients until all elements are balanced.

