QUARK: A GRADIENT-FREE QUANTUM LEARNING FRAMEWORK FOR CLASSIFICATION TASKS

Abstract

As more practical and scalable quantum computers emerge, much attention has been focused on realizing quantum supremacy in machine learning. Existing quantum ML methods either (1) embed a classical model into a target Hamiltonian to enable quantum optimization or (2) represent a quantum model using variational quantum circuits and apply classical gradient-based optimization. The former method leverages the power of quantum optimization but only supports simple ML models, while the latter provides flexibility in model design but relies on gradient calculation, resulting in barren plateau (i.e., gradient vanishing) and frequent classical-quantum interactions. To address the limitations of existing quantum ML methods, we introduce Quark, a gradient-free quantum learning framework that optimizes quantum ML models using quantum optimization. Quark does not rely on gradient computation and therefore avoids barren plateau and frequent classical-quantum interactions. In addition, Quark can support more general ML models than prior quantum ML methods and achieves a dataset-sizeindependent optimization complexity. Theoretically, we prove that Quark can outperform classical gradient-based methods by reducing model query complexity for highly non-convex problems; empirically, evaluations on the Edge Detection and Tiny-MNIST tasks show that Quark can support complex ML models and significantly reduce the number of measurements needed for discovering near-optimal weights for these tasks.

1. INTRODUCTION

Quantum computing provides a new computational paradigm to achieve exponential speedups over classical counterparts for various tasks, such as cryptography (Shor, 1994) , scientific simulation (Tazhigulov et al., 2022) , and data analytics (Arute et al., 2019) . A key advantage of quantum computing is its ability to entangle multiple quantum bits, called qubits, allowing n qubits to encode a 2 n -dimensional vector, while encoding this vector in classical computing requires 2 n bits. Inspired by this potential, recent work (Jaderberg et al., 2022; Macaluso et al., 2020b; Torta et al., 2021; Kapoor et al., 2016; Bauer et al., 2020; Farhi & Neven, 2018a; Schuld et al., 2014; Cong et al., 2019b) has focused on realizing quantum speedups over classical algorithms in the field of supervised learning. Existing quantum ML work can be divided into two categories: classical model with quantum optimization (CMQO) and quantum model with classical optimization (QMCO). First, CMQO methods embed a classical ML model jointly with the optimization problem into a target Hamiltonian and optimize the model using quantum adiabatic evolution (QAE) (Finnila et al., 1994) or quantum approximate optimization algorithm (QAOA) (Farhi et al., 2014; Torta et al., 2021) . As the transition between a classical model and the target Hamiltonian only applies to loworder polynomial activations (see Figure 2 ), CMQO methods do not support ML models with nonlinear activations that cannot be represented in low-order polynomial (e.g., ReLU). Second, QMCO methods optimize variational quantum modelsfoot_0 by iteratively performing gradient descent using classical optimizers. QMCO methods are fundamentally limited by barren plateau (i.e., gradient vanishing (McClean et al., 2018) ) and the high cost of frequent quantum-classical interactions. To address the limitations of existing quantum ML methods, we introduce Quark, a gradient-free quantum learning framework for classification tasks that optimizes quantum models with quantum 2) amplitude amplification using a Grover-based algorithm, and (3) weights measurement. Quark uses K-parallel datasets (KPD) to maximize the probability of observing highly accurate weights in each measurement. Darker bars in the probability plots denote weights with higher accuracies. optimization (QMQO). Figure 1 shows an overview of Quark. A key idea behind Quark is entangling the weightfoot_1 of an ML model (i.e., the R W register in Figure 1 ) and the encoded dataset (i.e., the R D register in Figure 1 ) in a quantum state, where model weights that achieve optimal classification accuracy on the training dataset can be observed with the highest probabilities in a measurement. Therefore, users can obtain highly accurate model weights by directly measuring the updated R W weight register. To maximize the probability of observing optimal weights, we introduce two key techniques. R W R O H U D U M U s ⊤ U Ψ 0 U s ⊤ U Ψ 0 ⋯ U comb U comb M | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ | 0⟩ U D R D k × preparation Ψ 0

Amplitude amplification

Amplitude amplification. Quark uses a Grover-based mechanism to iteratively update the probability distribution of weights based on their training accuracies. As a result, the probability of observing weights with higher accuracy increases after each Grover iteration, as shown in Figure 1 .

K-parallel datasets (KPD).

Applying amplitude amplification on one dataset results in a linear amplification scenario where the measuring probability of each weight is proportional to its training accuracy J(w i ). We further introduce K-parallel datasets, a technique to enable exponential amplification. Specifically, by entangling k identical training datasets with model weights in parallel (using k × R D , as shown in Figure 1 ), the probability of observing weight w i in a measurement is proportional to J(w i ) k . Therefore, as k increases, the optimized probability distribution of weights gradually converges to the optimal weights. Compared with CMQO methods, Quark provides more flexibility in model design by composing models directly on quantum circuits and therefore supports a broader range of ML models. Compared with QMCO methods, Quark does not require gradient calculation and therefore does not suffer from barren plateau. Quark avoids frequent classical-quantum interactions by realizing both model design and optimization fully on quantum. Besides, by using basis encoding for the training dataset, Quark supports non-linear operations (e.g., ReLU) in its model architecture, and the optimization complexity is independent of the training dataset size. Theoretically, we compare model query complexityfoot_2 between Quark and gradient-based methods on a balanced C-way classification task, and prove that Quark can outperform gradient-based methods by reducing model query complexity for highly non-convex problems. In addition, we prove that using K-parallel datasets can further reduce model query complexity under certain circumstances. Simulations on two tasks (i.e., Edge Detection and Tiny-MNIST) show that Quark supports complex ML models, which can include quantum convolution, pooling, fully connected, and ReLU layers.



They are also known as variational quantum circuits (VQC)-based models in the quantum literature. Throughout the paper, we use the term weight to refer to the set of all trainable parameters of a model. Number of model forward being called



Figure1: An overview of the Quark optimization framework. Each horizontal line indicates a qubit, and each box on these lines represents one quantum gates applied on these qubits. Quark's optimization pipeline includes three stages: (1) Ψ 0 preparation, which initializes R W , R D , R O and performs model's forward processing U M , (2) amplitude amplification using a Grover-based algorithm, and (3) weights measurement. Quark uses K-parallel datasets (KPD) to maximize the probability of observing highly accurate weights in each measurement. Darker bars in the probability plots denote weights with higher accuracies.

