TOWARD TRAINABILITY OF QUANTUM NEURAL NET-WORKS

Abstract

Quantum Neural Networks (QNNs) have been recently proposed as generalizations of classical neural networks to achieve the quantum speed-up. Despite the potential to outperform classical models, serious bottlenecks exist for training QNNs; namely, QNNs with random structures have poor trainability due to the vanishing gradient with rate exponential to the input qubit number. The vanishing gradient could seriously influence the applications of large-size QNNs. In this work, we provide a first viable solution with theoretical guarantees. Specifically, we prove that QNNs with tree tensor and step controlled architectures have gradients that vanish at most polynomially with the qubit number. Moreover, our result holds irrespective of which encoding methods are employed. We numerically demonstrate QNNs with tree tensor and step controlled structures for the application of binary classification. Simulations show faster convergent rates and better accuracy compared to QNNs with random structures.

1. INTRODUCTION

Neural Networks (Hecht-Nielsen, 1992) using gradient-based optimizations have dramatically advanced researches in discriminative models, generative models, and reinforcement learning. To efficiently utilize the parameters and practically improve the trainability, neural networks with specific architectures (LeCun et al., 2015) are introduced for different tasks, including convolutional neural networks (Krizhevsky et al., 2012) for image tasks, recurrent neural networks (Zaremba et al., 2014) for the time series analysis, and graph neural networks (Scarselli et al., 2008) for tasks related to graph-structured data. Recently, the neural architecture search (Elsken et al., 2019) is proposed to improve the performance of the networks by optimizing the neural structures. Despite the success in many fields, the development of the neural network algorithms could be limited by the large computation resources required for the model training. In recent years, quantum computing has emerged as one solution to this problem, and has evolved into a new interdisciplinary field known as the quantum machine learning (QML) (Biamonte et al., 2017; Havlíček et al., 2019) . Specifically, variational quantum circuits (Benedetti et al., 2019) have been explored as efficient protocols for quantum chemistry (Kandala et al., 2017) and combinatorial optimizations (Zhou et al., 2018) . Compared to the classical circuit models, quantum circuits have shown greater expressive power (Du et al., 2020a) , and demonstrated quantum advantage for the low-depth case (Bravyi et al., 2018) . Due to the robustness against noises, variational quantum circuits have attracted significant interest for the hope to achieve the quantum supremacy on near-term quantum computers (Arute et al., 2019) . Quantum Neural Networks (QNNs) (Farhi & Neven, 2018; Schuld et al., 2020; Beer et al., 2020) are the special kind of quantum-classical hybrid algorithms that run on trainable quantum circuits. Recently, small-scale QNNs have been implemented on real quantum computers (Havlíček et al., 2019) for supervised learning tasks. The training of QNNs aims to minimize the objective function f with respect to parameters θ. Inspired by the classical optimizations of neural networks, a natural strategy to train QNNs is to exploit the gradient of the loss function (Crooks, 2019) . However, the recent work (McClean et al., 2018) shows that n-qubit quantum circuits with random structures and large depth L = O(poly(n)) tend to be approximately unitary 2-design (Harrow & Low, 2009) , and the partial derivative vanishes to zero exponentially with respect to n. The vanishing gradient problem is usually referred to as the Barren Plateaus (McClean et al., 2018) , and could affect the trainability of QNNs in two folds. Firstly, simply using the gradient-based method like Stochastic Gradient Descent (SGD) to train the QNN takes a large number of iterations. Secondly, the estimation of the derivatives needs an extremely large number of samples from the quantum output to guarantee a relatively accurate update direction (Chen et al., 2018) . To avoid the Barren Plateaus phenomenon, we explore QNNs with special structures to gain fruitful results. In this work, we introduce QNNs with special architectures, including the tree tensor (TT) structure (Huggins et al., 2019) referred to as TT-QNNs and the setp controlled structure referred to as SC-QNNs. We prove that for TT-QNNs and SC-QNNs, the expectation of the gradient norm of the objective function is bounded. Theorem 1.1. (Informal) Consider the n-qubit TT-QNN and the n-qubit SC-QNN defined in Figure 1 -2 and corresponding objective functions f TT and f SC defined in (3-4), then we have: 1 + log n 2n • α(ρ in ) ≤ E θ ∇ θ f TT 2 ≤ 2n -1, 1 + n c 2 1+nc • α(ρ in ) ≤ E θ ∇ θ f SC 2 ≤ 2n -1, where n c is the number of CNOT operations that directly link to the first qubit channel in the SC-QNN, the expectation is taken for all parameters in θ with uniform distributions in [0, 2π], and α(ρ in ) ≥ 0 is a constant that only depends on the input state ρ in ∈ C 2 n ×2 n . Moreover, by preparing ρ in using the L-layer encoding circuit in Figure 4 , the expectation of α(ρ in ) could be further lower bounded as Eα(ρ in ) ≥ 2 -2L . Compared to random QNNs with 2 -O(poly(n)) derivatives, the gradient norm of TT-QNNs ad SC-QNNs is greater than Ω(1/n) or Ω(2 -nc ) that could lead to better trainability. Our contributions are summarized as follows: • We prove Ω(1/n) and Ω(2 -nc ) lower bounds on the expectation of the gradient norm of TT-QNNs and SC-QNNs, respectively, that guarantees the trainability on related optimization problems. Our theorem does not require the unitary 2-design assumption in existing works and is more realistic to near-term quantum computers. • We prove that by employing the encoding circuit in Figure 4 to prepare ρ in , the expectation of term α(ρ in ) is lower bounded by a constant 2 -2L . Thus, we further lower bounded the expectation of the gradient norm to the term independent from the input state. • We simulate the performance of TT-QNNs, SC-QNNs, and random structure QNNs on the binary classification task. All results verify proposed theorems. Both TT-QNNs and SC-QNNs show better trainability and accuracy than random QNNs. Our proof strategy could be adopted for analyzing QNNs with other architectures as future works. With the proven assurance on the trainability of TT-QNNs and SC-QNNs, we eliminate one bottleneck in front of the application of large-size Quantum Neural Networks. The rest parts of this paper are organized as follows. We address the preliminary including the definitions, the basic quantum computing knowledge and related works in Section 2. The QNNs with special structures and the corresponding results are presented in Section 3. We implement the binary classification using QNNs with the results shown in Section 4. We make conclusions in Section 5.

2. PRELIMINARY

2.1 NOTATIONS AND THE BASIC QUANTUM COMPUTING We use [N ] to denote the set {1, 2, • • • , N }. The form • denotes the • 2 norm for vectors. We denote a j as the j-th component of the vector a. The tensor product operation is denoted as "⊗". The conjugate transpose of a matrix A is denoted as A † . The trace of a matrix A is denoted as Tr [A] . We denote ∇ θ f as the gradient of the function f with respect to the vector θ. We employ notations O and Õ to describe the standard complexity and the complexity ignoring minor terms, respectively.

