MATRIX SHUFFLE-EXCHANGE NETWORKS FOR HARD 2D TASKS

Abstract

Convolutional neural networks have become the main tools for processing twodimensional data. They work well for images, yet convolutions have a limited receptive field that prevents its applications to more complex 2D tasks. We propose a new neural model, called Matrix Shuffle-Exchange network, that can efficiently exploit long-range dependencies in 2D data and has comparable speed to a convolutional neural network. It is derived from Neural Shuffle-Exchange network and has O(log n) layers and O(n 2 log n) total time and space complexity for processing a n × n data matrix. We show that the Matrix Shuffle-Exchange network is well-suited for algorithmic and logical reasoning tasks on matrices and dense graphs, exceeding convolutional and graph neural network baselines. Its distinct advantage is the capability of retaining full long-range dependency modelling when generalizing to larger instances -much larger than could be processed with models equipped with a dense attention mechanism.

1. INTRODUCTION

Data often comes in a form of two-dimensional matrices. Neural networks are often used for processing such data usually involving convolution as the primary processing method. But convolutions are local, capable of analyzing only neighbouring positions in the data matrix. That is good for images since the neighbouring pixels are closely related, but not sufficient for data having more distant relationships. In this paper, we consider the problem how to efficiently process 2D data in a way that allows both local and long-range relationship modelling and propose a new neural architecture, called Matrix Shuffle-Exchange network, to this end. The complexity of the proposed architecture is O(n 2 log n) for processing n × n data matrix, which is significantly lower than O(n 4 ) if one would use the attention (Bahdanau et al., 2014; Vaswani et al., 2017) in its pure form. The architecture is derived from the Neural Shuffle-Exchange networks (Freivalds et al., 2019; Draguns et al., 2020) by lifting their architecture from 1D to 2D. We validate our model on tasks with differently structured 2D input/output data. It can handle complex data inter-dependencies present in algorithmic tasks on matrices such as transposition, rotation, arithmetic operations and matrix multiplication. Our model reaches the perfect accuracy on test instances of the same size it was trained on and generalizes on much larger instances. In contrast, a convolutional baseline can be trained only on small instances and does not generalize. The generalization capability is an important measure for algorithmic tasks to say that the model has learned an algorithm, not only fitted the training data. Our model can be used for processing graphs by representing a graph with its adjacency matrix. It has a significant advantage over graph neural networks (GNN) in case of dense graphs having additional data associated with graph edges (for example, edge length) since GNNs typically attach data to vertices, not edges. We demonstrate that the proposed model can infer important local and non-local graph concepts by evaluating it on component labelling, triangle finding and transitivity tasks. It reaches the perfect accuracy on test instances of the same size it was trained on and generalizes on larger instances while the GNN baseline struggles to find these concepts even on small graphs. The model can perform complex logical reasoning required to solve Sudoku puzzles. It achieves 100% correct solutions on easy puzzles and 96.6% on hard puzzles which is on par with the state-of-the-art deep learning model which was specifically tailored for logical reasoning tasks.

