DUAL ALGORITHMIC REASONING

Abstract

Neural Algorithmic Reasoning is an emerging area of machine learning which seeks to infuse algorithmic computation in neural networks, typically by training neural models to approximate steps of classical algorithms. In this context, much of the current work has focused on learning reachability and shortest path graph algorithms, showing that joint learning on similar algorithms is beneficial for generalisation. However, when targeting more complex problems, such "similar" algorithms become more difficult to find. Here, we propose to learn algorithms by exploiting duality of the underlying algorithmic problem. Many algorithms solve optimisation problems. We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning and qualitatively better solutions. Specifically, we exploit the max-flow min-cut theorem to simultaneously learn these two algorithms over synthetically generated graphs, demonstrating the effectiveness of the proposed approach. We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task, which likely depends on the vessels' flow properties. We demonstrate a clear performance gain when using our model within such a context, and empirically show that learning the max-flow and min-cut algorithms together is critical for achieving such a result.

1. INTRODUCTION

Learning to perform algorithmic-like computation is a core problem in machine learning that has been widely studied from different perspectives, such as learning to reason (Khardon & Roth, 1997) , program interpreters (Reed & De Freitas, 2015) and automated theorem proving (Rocktäschel & Riedel, 2017) . As a matter of fact, enabling reasoning capabilities of neural networks might drastically increase generalisation, i.e. the ability of neural networks to generalise beyond the support of the training data, which is usually a difficult challenge with current neural models (Neyshabur et al., 2017) . Neural Algorithmic Reasoning (Velickovic & Blundell, 2021 ) is a recent response to this long-standing question, attempting to train neural networks to exhibit some degrees of algorithmic reasoning by learning to execute classical algorithms. Arguably, algorithms are designed to be general, being able to be executed and return "optimal" answers for any inputs that meet a set of strict pre-conditions. On the other hand, neural networks are more flexible, i.e. can adapt to virtually any input. Hence, the fundamental question is whether neural models may inherit some of the positive algorithmic properties and use them to solve potentially challenging real-world problems. Historically, learning algorithms has been tackled as a simple supervised learning problem (Graves et al., 2014; Vinyals et al., 2015) , i.e. by learning an input-output mapping, or through the lens of reinforcement learning (Kool et al., 2019) . However, more recent works build upon the notion of algorithmic alignment (Xu et al., 2020) stating that there must be an "alignment" between the learning model structure and the target algorithm in order to ease optimisation. Much focus has been placed on Graph Neural Networks (GNNs) (Bacciu et al., 2020) learning graph algorithms, i.e Bellman-Ford (Bellman, 1958) . Velickovic et al. (2020b) show that it is indeed possible to train GNNs to execute classical graph algorithms. Furthermore, they show that optimisation must occur on all the intermediate steps of a graph algorithm, letting the network learn to replicate step-wise transformations of the input rather than learning a map from graphs to desired outputs. Since then, algorithmic reasoning has been applied with success in reinforcement learning (Deac et al., 2021 ), physics simulation (Velickovic et al., 2021) and bipartite matching (Georgiev & Lió, 2020) . Moreover, Xhonneux et al. (2021) verify the importance of training on multiple "similar" algorithms at once (multi-task learning). The rationale is that many classical algorithms share sub-routines, i.e. Bellman-Ford and Breadth-First Search (BFS), which help the network learn more effectively and be able to transfer knowledge among the target algorithms. Ibarz et al. (2022) expand on this concept by building a generalist neural algorithmic learner that can effectively learn to execute even a set of unrelated algorithms. However, learning some specific algorithms might require learning of very specific properties of the input data, for which multi-task learning may not help. For instance, learning the Ford-Fulkerson algorithm (Ford & Fulkerson, 1956) for maximum flow entails learning to identify the set of critical (bottleneck) edges of the flow network, i.e. edges for which a decrease in the edge capacity would decrease the maximum flow. Furthermore, in the single-task regime, i.e. when we are interested in learning only one single algorithm, relying on multi-task learning can unnecessarily increase the computational burden on the training phase. Motivated by these requirements, we seek alternative learning setups to alleviate the need for training on multiple algorithms and enable better reasoning abilities of our algorithmic reasoners. We find a potentially good candidate in the duality information of the target algorithmic problem. The concept of duality fundamentally enables an algorithmic problem, e.g. linear program, to be viewed from two perspectives, that of a primal and a dual problem. These two problems are usually complementary, i.e. the solution of one might lead to the solution of the other. Hence, we propose to incorporate duality information directly in the learning model both as an additional supervision signal and input feature (by letting the network reuse its dual prediction in subsequent steps of the algorithm), an approach we refer to as Dual Algorithmic Reasoning (DAR). To the best of our knowledge, there exists no prior work targeting the usage of duality in algorithmic reasoning. We show that by training an algorithmic reasoner on both learning of an algorithm and optimisation of the dual problem we can relax the assumption of having multiple algorithms to train on while retaining all the benefits of multi-task learning. We demonstrate clear performance gain in both synthetically generated algorithmic tasks and real-world predictive graph learning problems.

2. PROBLEM STATEMENT

We study the problem of neural algorithmic reasoning on graphs. Specifically, we target learning of graph algorithms A : G → Y that take in graph-structured inputs G = (V, E, x i , e ij ), with V being the set of nodes and E the set of edges with node features x i and edge features e ij , and compute a desired output y ∈ Y. Usually, the output space of an algorithm A depends on its scope. In the most general cases, it can either be R |V | (node-level output), R |V |×|V | (edge-level output) or R (graph-level output). We mainly consider the class of algorithms outputting node-level and edge-level outputs, which includes many of the most well-known graph problems, e.g. reachability, shortest path and maximum flow. From a neural algorithmic reasoning perspective, we are particularly interested in learning a sequence of transformations (steps of the algorithm). Hence, we consider a sequence of graphs {G (0) , . . . , G (T -1) } where each element represents the intermediate state of the target algorithm we aim to learn. At each step t we have access to intermediate node and edge features, i.e. x (t) i , e (t) ij , called hints as well as intermediate targets y (t) . As it is common in classical algorithms, some of the intermediate targets may be used as node/edge features in the subsequent step of the algorithm. Such hints are thus incorporated in training as additional features/learning targets, effectively learning the whole sequence of steps (algorithm trajectory). In particular, we focus on learning maximum flow via the neural execution of the Ford-Fulkerson algorithm. Differently from Georgiev & Lió (2020), who learn Ford-Fulkerson to find the independent set of edges in bipartite graphs, we aim to learn Ford-Fulkerson for general graphs. We report the pseudo-code of Ford-Fulkerson in the appendix. Ford-Fulkerson poses two key challenges: (i) it comprises two sub-routines, i.e. finding augmenting paths from s to t, and updating the flow assignment F (t) ∈ R |V |×|V | at each step t; (ii) F must obey a set of strict constraints, namely the edge-capacity constraint and conservation of flows. The former states that a scalar value c ij

