QUANTUM DEFORMED NEURAL NETWORKS

Abstract

We develop a new quantum neural network layer designed to run efficiently on a quantum computer but that can be simulated on a classical computer when restricted in the way it entangles input states. We first ask how a classical neural network architecture, both fully connected or convolutional, can be executed on a quantum computer using quantum phase estimation. We then deform the classical layer into a quantum design which entangles activations and weights into quantum superpositions. While the full model would need the exponential speedups delivered by a quantum computer, a restricted class of designs represent interesting new classical network layers that still use quantum features. We show that these quantum deformed neural networks can be trained and executed on normal data such as images, and even classically deliver modest improvements over standard architectures.

1. INTRODUCTION

Quantum mechanics (QM) is the most accurate description for physical phenomena at very small scales, such as the behavior of molecules, atoms and subatomic particles. QM has a huge impact on our every day lives through technologies such as lasers, transistors (and thus microchips), superconductors and MRI. A recent view of QM has formulated it as a (Bayesian) statistical methodology that only describes our subjective view of the (quantum) world, and how we update that view in light of evidence (i.e. measurements) (; t Hooft, 2016; Fuchs & Schack, 2013) . This is in perfect analogy to the classical Bayesian view, a statistical paradigm extensively used in artificial intelligence where we maintain probabilities to represent our beliefs for events in the world. The philosophy of this paper will be to turn this argument on its head. If we can view QM as just another consistent statistical theory that happens to describe nature at small scales, then we can also use this theory to describe classical signals by endowing them with a Hilbert space structure. In some sense, the 'only' difference with Bayesian statistics is that the positive probabilities are replaced with complex 'amplitudes'. This however has the dramatic effect that, unlike in classical statistics, interference between events now becomes a possibility. In this paper we show that this point of view uncovers new architectures and potential speedups for running neural networks on quantum computers. We shall restrict our attention here to binary neural networks. We will introduce a new class of quantum neural networks and interpret them as generalizations of probabilistic binary neural networks, discussing potential speedups by running the models on a quantum computer. Then we will devise classically efficient algorithms to train the networks for a restricted set of quantum circuits. We present results of classical simulations of the quantum neural networks on real world data sizes and related gains in accuracy due to the quantum deformations. Contrary to almost all other works on quantum deep learning, our quantum neural networks can be simulated for practical classical problems, such as images or sound. The quantum nature of our models is there to increase the flexibility of the model-class and add new operators to the toolbox of the deep learning researcher, some of which may only reach their full potential when quantum computing becomes ubiquitous. In Farhi & Neven (2018) variational quantum circuits that can be learnt via stochastic gradient descent were introduced. Their performance could be studied only on small input tasks such as classifying 4 × 4 images, due to the exponential memory requirement to simulate those circuits. Other works on variational quantum circuits for neural networks are Verdon et al. (2018); Beer et al. (2019) . Their focus is similarly on the implementation on near term quantum devices and these models cannot be efficiently run on a classical computer. Exceptions are models which use tensor network simulations (Cong et al., 2019; Huggins et al., 2019) where the model can be scaled to 8 × 8 image data with 2 classes, at the price of constraining the geometry of the quantum circuit (Huggins et al., 2019) . The quantum deformed neural networks introduced in this paper are instead a class of variational quantum circuits that can be scaled to the size of data that are used in traditional neural networks as we demonstrate in section 4.2. Another line of work directly uses tensor networks as full precision machine learning models that can be scaled to the size of real data (Miles Stoudenmire & Schwab, 2016; Liu et al., 2017; Levine et al., 2017; Levine et al., 2019) . However the constraints on the network geometry to allow for efficient contractions limit the expressivity and performance of the models. See however Cheng et al. ( 2020) for recent promising developments. Further, the tensor networks studied in these works are not unitary maps and do not directly relate to implementations on quantum computers. A large body of work in quantum machine learning focuses on using quantum computing to provide speedups to classical machine learning tasks (Biamonte et al., 2017; Ciliberto et al., 2018; Wiebe et al., 2014) , culminating in the discovery of quantum inspired speedups in classical algorithms (Tang, 2019) . In particular, (Allcock et al., 2018; Cao et al., 2017; Schuld et al., 2015; Kerenidis et al., 2019) discuss quantum simulations of classical neural networks with the goal of improving the efficiency of classical models on a quantum computer. Our models differ from these works in two ways: i) we use quantum wave-functions to model weight uncertainty, in a way that is reminiscent of Bayesian models; ii) we design our network layers in a way that may only reach its full potential on a quantum computer due to exponential speedups, but at the same time can, for a restricted class of layer designs, be simulated on a classical computer and provide inspiration for new neural architectures. Finally, quantum methods for accelerating Bayesian inference have been discussed in Zhao et al. (2019b; a) but only for Gaussian processes while in this work we shall discuss relations to Bayesian neural networks.

2. GENERALIZED PROBABILISTIC BINARY NEURAL NETWORKS

Binary neural networks are neural networks where both weights and activations are binary. Let B = {0, 1}. A fully connected binary neural network layer maps the N activations h ( ) at level to the N +1 activations h ( +1) at level + 1 using weights W ( ) ∈ B N N +1 : h ( +1) j = f (W ( ) , h ( ) ) = τ 1 N + 1 N i=1 W ( ) j,i h ( ) i , τ (x) = 0 x < 1 2 1 x ≥ 1 2 . ( ) We divide by N + 1 since the sum can take the N + 1 values {0, . . . , N }. We do not explicitly consider biases which can be introduced by fixing some activations to 1. In a classification model h (0) = x is the input and the last activation function is typically replaced by a softmax which produces output probabilities p(y|x, W ), where W denotes the collection of weights of the network. Given M input/output pairs X = (x 1 , . . . , x M ), Y = (y 1 , . . . , y M ), a frequentist approach would determine the binary weights so that the likelihood p(Y |X, W ) = M i=1 p(y i |x i , W ) is maximized. Here we consider discrete or quantized weights and take the approach of variational optimization Staines & Barber (2012), which introduces a weight distribution q θ (W ) to devise a surrogate differential objective. For an objective O(W ), one has the bound max W ∈B N O(W ) ≥ E q θ (W ) [O(W )], and the parameters of q θ (W ) are adjusted to maximize the lower bound. In our case we consider the objective: 



max W ∈B N log p(Y |X, W ) ≥ L := E q θ (W ) [log p(Y |X, W )] = M i=1 E q θ (W ) [log p(y i |x i , W )] . (2)

