ADVERSARIAL ROBUSTNESS BASED ON RANDOMIZED SMOOTHING IN QUANTUM MACHINE LEARNING

Abstract

We present an end-to-end Quantum Machine Learning algorithm for Quantum Adversarial Robustness (QuAdRo) that provides a certified radius for a base classifier, with robustness guarantees based on randomized smoothing -the state-of-the-art defense against adversarial attacks. Classically, the number of samples, also the number of queries to the base classifier scale with O(1/ϵ 2 ) where ϵ is the desired error bound in the expected value of the probability measure ρ defined over the randomized smoothing neighborhood around the input. Our algorithm solves the same problem for a Quantum Computing classifier. We prove that the number of queries to the base classifier is O(1/ϵ) for the same confidence and error bound. We also present the unitary circuit for QuAdRo, which includes the state preparation methods and circuits for smoothing distributions used to defend against common adversaries -modelled using l 0 , l 1 , l 2 norms, and other metrics. The results of the comparison between the classical and the simulation of the quantum algorithm are also discussed.

1. INTRODUCTION

Machine Learning (ML) models have become ubiquitous over the last decade. There has also been a massive interest in Quantum Computing (QC) and Quantum Machine Learning (QML) (Aaronson, 2015) , with algorithms like Shor's factoring (Shor, 1999) , HHL (Harrow et al., 2009) algorithm for solving a linear system of equations, etc providing an exponential speedup over their classical counterparts. There exists another class of QC algorithms with a polynomial speedup -Grover's algorithm (Grover, 1996) for random database search, Quantum Amplitude Estimation (QAE) (Brassard et al., 2002) for counting problem, Bernstein-Vazirani (Bernstein & Vazirani, 1997) algorithm for parity problem, etc. Shortcomings of classical ML algorithms against malicious actors is a widely-studied sub-domain (Goodfellow et al., 2014; Madry et al., 2017) of ML. Common attack vectors include data poisoning, backdoor attacks, and adversarial attacks. Adversarial attacks can easily trick well-trained classifiers into misclassifying an input perturbed by a small, usually imperceptible, noise. QML algorithms are prone to the same problems (Weber et al., 2021; Liao et al., 2021; Guan et al., 2020; Lu et al., 2020; Ren et al., 2022) . Popular methods for adversarial defense (Madry et al., 2017) find it challenging to train large, robust classifiers, which are essential to solving real-world problems at the scale of ImageNet (Deng et al., 2009) or larger. These methods do not offer certifiable guarantees of robustness, even when they work well in practice. Randomized smoothing is a state-of-the-art method that offers provable robustness against adversarial attacks without any assumptions about the underlying classifier. The defense works by aggregating a classifier's output in a region around the input -henceforth called the smoothing neighborhood -and computing the average probability of a class ρ c . It is prohibitively expensive to compute the exact value of ρ c over the smoothing neighborhood since the number of points is exponential in the input dimension. In practice, Monte Carlo sampling algorithms are used to estimate ρ c . Typically, randomized smoothing for adversarial robustness (Cohen et al., 2019; Yang et al., 2020; Lee et al., 2019) requires N classical ≈ 10 5 -10 6 samples from the smoothing neighborhood. Contributions In this paper, we discuss a purely QC approach to implementing randomized smoothing by using an orthogonal representation for the input space and use existing formalism for the Quantum Counting problem (Brassard et al., 2002) . We create a superposition of the smoothing neighborhood of the input image and use our quantum circuit to output the average probability of prediction for a class ρ c . We also design qubit state encoding and state preparation circuits for l 0 , l 1 and other l p norm adversaries, and provide results from the simulation of the algorithm in Section 6. Theorem 1 QuAdRo encodes an input x into a quantum state |ψ⟩ and, for error ϵ and confidence 1 -δ, requires total M = O(1/ϵ) queries to the base classifier QN N c to return certified radius for x. In comparison, any classical implementation of randomized smoothing based certification requires M = O(1/ϵ 2 ) queries for the same guarantees. Theorem 1 has been proved in Sec 4, and QuAdRo is presented in Alg 1.

2. RELATED WORK

2.1 RANDOMIZED SMOOTHING Randomized smoothing (Cohen et al., 2019; Yang et al., 2020) method has achieved provable robustness against adversarial attacks. Given an input, one can define a smoothing neighborhood based on the threat model of the adversary described by l p norm and scale parameter λ. Such a robust model outputs the most likely class in the smoothing neighborhood returned by a base classifier, and this output is stable against l p perturbations. Cohen et al. (2019) first proved tight robustness guarantees for l 2 norm adversary using Gaussian smoothing. Later, Yang et al. (2020) provided guarantees for a larger set of adversaries and smoothing distributions, except l 0 norm, which was provided by Lee et al. (2019) .

2.2. QUBIT STATE PREPARATION

Qubits are logical units of information for Quantum Computers, equivalent to bits in classical computers. Any QC device is made up of qubits that have the following two properties -superposition and entanglement. Superposition refers to a qubit's ability to exist in multiple states at the same time, while entanglement refers to the ability of multiple qubits to exist in a shared state such that an operation on one qubit also affects the state of another qubit instantaneously, without any additional transfer of information. Generally, n qubits encompass a 2 n dimensional space where if bitstring i = b n-1 ...b 1 b 0 , then state |i⟩ = ⊗ n-1 j=0 |b j ⟩ where b j ∈ {0, 1}. There are numerous methods for encoding information into qubit states, often optimized for the target problem. For example, Novel Enhanced Quantum Representation (NEQR) (Zhang et al., 2013) and Flexible Representation of Quantum Images (FRQI) (Le et al., 2011) used for QC Image algorithms differ from variational heuristics for calculating molecular energies like Unitary Coupled-Cluster ansatz (Romero et al., 2018) . Encoding methods popular in QML applications use the amplitude of a quantum state |i⟩ as the representation of input vector element x[i] to be encoded. This representation is really efficient but has the drawback that quantum search, amplitude estimation, etc., cannot be applied to such qubit states due to a lack of orthogonality. Amplitude encoding uses the same number of gates but a logarithmic number of qubits compared to the basis state encoding. A number of distributions can be prepared as a superposed qubit state(Rattew & Koczor, 2022) -Log concave distributions (Grover & Rudolph, 2002) , Uniform distribution using Quantum Fourier Transform (Deutsch, 1985) , etc. Distribution parameters can be modified, either during state preparation, or using circuits like QADD (Koch et al., 2022) mid-circuit. QFT and inverse QFT, in particular, are a common pair of pre-and post-processing circuits that concentrate information from a superposition via a Fourier transform.

2.3. GROVER'S SEARCH ALGORITHM

Given a boolean objective function f defined over unstructured space S of size N such that ∃x ∈ S : f (x) = 1 and a QC Oracle O operator O |x⟩ |y⟩ = |x⟩ |y f (x)⟩, Grover's Search algorithm (Grover, 1996) is applied repeatedly to concentrate the set of desired outcomes. The result of the measurement of a concentrated state is a value x ∈ S such that f (x) = 1 with an arbitrarily high probability. The original formulation requires |ψ⟩ to be a uniform superposition of all values in the space, but later the same results were extended to arbitrary unitary operations U (Biron et al., 1999) . No QC unstructured search algorithm can perform better than O( √ N ) (Boyer et al., 1998) .

2.4. QUANTUM AMPLITUDE ESTIMATION

An alternate formulation of Grover's search algorithm into a counting problem allows us to use the same number of oracle queries to estimate the number of solutions in S for f (x) = 1 using Quantum Amplitude Estimation (QAE) (Brassard et al., 2002; Boyer et al., 1998) . Variants of QAE (Grinko et al., 2021; Aaronson & Rall, 2020; Suzuki et al., 2020) tailored for low circuit depth and higher confidence use fewer oracle queries and/or fewer repetitions. The vanilla QAE in Figure 1 can be replaced with one of these variants without any loss of generality. A detailed comparison of these QAE variants can be found in Figure 3 in Grinko et al. (2021) .

2.5. QUANTUM MACHINE LEARNING AND ADVERSARIAL ROBUSTNESS

There are multiple formulations of QML classifiers (Cong et al., 2019; Abohashima et al., 2020) . Previous works have shown that QML models are prone to adversarial attacks. For example, (Guan et al., 2020) checks the robustness of QML models against noise in the training data using classical methods by modelling the verification problem as a classical SDP. In contrast, (Weber et al., 2021) implements new protocols for QML to certify robustness optimally. The threat models and algorithm in these papers differ from the algorithm presented here, though a QAE-based approach can provide a quadratic speedup in the case of (Weber et al., 2021) as well.

3. QUANTUM ADVERSARIAL ROBUSTNESS

As shown in Figure 1 , the proposed algorithm for Quantum Adversarial Robustness (QuAdRo) uses state preparation U p,λ and Grover Diffusion Operator G c described in subsection 3.1 and 3.2. Using QAE, we can measure ρ c -an estimate of ρ c -that base classifier answers correctly after smoothing, to high precision with high confidence, as discussed in subsection 4. The symbols in the paper are defined where they are first used, and a detailed table of notations is available in Appendix A. c : |0⟩ X H G 2 0 c G 2 1 c G ... c G 2 m c H X α 0 : |0⟩ U p,λ U p,λ † α 1 : |0⟩ α 2 : |0⟩ α 3 : |0⟩ α 4 : |0⟩ α 5 : |0⟩ a 0 : |0⟩ QFT • QFT † a 1 : |0⟩ • a ... : |0⟩ • a m : |0⟩ • out : / 4 0 1 ... m Figure 1: Quantum Estimation Circuit (QEC). α 0 : IL j (Input) S p,λ j (Input) α 1 : α 2 : α 3 : α 4 : α 5 : α 6 : α 7 : (a) U p,λ j (Input) circuit. S p,λ j can smooth the input image state to a higher resolution. Grover Diffusion Operator can only be used to search when distinct values in the input space are orthogonal. Here onwards, the smoothing neighborhood is represented as a superposition where the square of the amplitude of any perturbed input |x + ε⟩ represents its probability in the smoothing distribution. Due to the requirements of orthogonality between individual inputs and representing the exponentially sized smoothing neighborhood, only an orthogonal basis state encoding can be used. This choice requires the same number of qubits as the number of classical bits. y : QNN c α 0 : U p,λ † 2 |0⟩ ⟨0| -I U p,λ Before state preparation, state qubits α and ancilla a are reset to |0⟩. First, the input loader circuit IL loads the input x from classical bits into qubit state |ψ⟩. For the rest of the discussion, we define d as the size of the input vector x, and v is the number of qubits needed to represent any x j . For a 28x28 grayscale image, d = 784 and v is the resolution of a single pixel in the input, i.e. if each pixel belongs to range 0-255, i.e. [0, 2 8 -1], then v = 8. Let j th value in input x be x j = b j0 b j1 ...b j(v-1) , i.e. b ji : i th classical bit in x j , then |ψ⟩ = ⊗ d-1 j=0 |x j ⟩ = |α 0 α 1 α 2 ....α dv-1 ⟩ (2) To accomplish this, we use Pauli operators σ i ie. σ 0 = I, σ 1 = X to create input loader circuit IL j that loads b ji into qubit state α jv+i , defined as the tensor product of σ i as follows IL j = ⊗ v-1 i=0 σ bji and by definition IL j |0⟩ v = |α j ⟩. Definition 1 Input Loader operator IL can be defined as IL = ⊗ d-1 j=0 IL j (4) such that IL |0⟩ dv = |α 0 α 1 α 2 ....α dv-1 ⟩ = ⊗ d-1 j=0 |x j ⟩ (5) Circuit IL comprises of a maximum of vd 1-qubit X gates. For any input x, if ∥x∥ 1 = k, then IL has exactly k X gates, corresponding to each |1⟩ in |ψ⟩. After loading the input, a superposition of the smoothing neighborhood of the input x is prepared based on the distribution ϕ(•; λ) set to defend against a given l p norm adversary and parameterized by λ. Each perturbed value x + ε is another valid input to the base classifier. For a detailed discussion about the design and functioning of the qubit encoding scheme, look at Appendix C. Definition 2 Smoothing circuit S p,λ with distribution ϕ defined by p, λ, such that S p,λ j | j ⟩ = Σ 2 v -1 k=0 ϕ j (ε k ; λ) |x j + ε k ⟩ (6) where x j + ε k is a value in the neighborhood of α j . In our scheme, the complete smoothing operator S p,λ maps each value x j in the input vector independently to a probability-weighted superposition of its neighborhood. For any i in the input space, probability weight of i in the smoothed qubit state is ϕ(i; λ) = Π j ϕ j (i jv i jv+1 ...i (j+1)v-1 ; λ). Hence S p,λ |ψ⟩ = ⊗ d j=1 S p,λ j |α j ⟩ = Σ 2 dv -1 i=0 ϕ(i; λ) |i⟩ In this particular case, all ϕ j are identical distributions centered around mean x j . Detailed circuits for each p norm are discussed in section 5 Definition 3 Qubit State Preparation Operator U p,λ (Figure 2a ) comprises of IL and S p,λ U p,λ = ⊗ d-1 j=0 U p,λ j = ⊗ d-1 j=0 S p,λ j IL j (8) 3.2 GROVER DIFFUSION OPERATOR G c Based on section 2.  Setting |y⟩ = |-⟩ = 1 √ 2 (|0⟩ -|1⟩) (Deutsch & Jozsa, 1992). QN N c |x⟩ |-⟩ = (-1) fc(x) |x⟩ |-⟩ When applying QN N c to the smoothed superposition of inputs given by U p,λ |0⟩ dv from section 3.1, ρ c is the expected probability output by our algorithm i.e. input |x + ε⟩ in the smoothing neighborhood belongs to class c ρ c = E(P(f c (x + ε) = 1)) Definition 5 The Grover Diffusion Operator G c derives from state preparation circuit U p,λ and QN N c as shown in Figure 2b . G c = U p,λ (2 |0⟩ ⟨0| -I)U p,λ † QN N c 3.3 QUANTUM ESTIMATION CIRCUIT QEC uses QAE (Brassard et al., 2002) to solve the counting problem defined for QN N c using U p,λ and G c . The ancilla qubits |a⟩ in Figure 1 are initialized in a uniform superposition in the range [0, M] via QFT. Here, M is the total number of calls to the oracle in the circuit. Then, we apply controlled G 2 k c gates using |a k ⟩ as control, which results in QAE |ψ⟩ |a⟩ = G a c |ψ⟩ ⊗ |a⟩ for all a. After applying QF T † , we measure the ancilla state to obtain θ such that the probability measure ρ c is ρ c = sin 2 (θ/2) (14)

4. THEORETICAL BOUNDS

As noted earlier, the smoothing distribution ϕ j for each value in the input vector x is independent, as per classical robustness criteria (Cohen et al., 2019; Yang et al., 2020) . We claim that the proposed quantum algorithm QuAdRo finds ρ c with a quadratic speedup compared to the best-known classical algorithm with the same error bound. We show that the certified robustness problem can be reduced to a counting problem, and for a given error ϵ in the measurement of ρ c for fixed confidence δ, the best known classical algorithm is O( 1 ϵ 2 ) and QuAdRo is O( 1 ϵ ). This is also the best speedup possible for a counting problem using a quantum computer (Brassard et al., 2002) . Algorithm 1 Quantum Adversarial Robustness (QuAdRo) procedure QUADRO(Input |ψ⟩, Class c, U p,λ , G c , M, δ) N rep = 12 log 1 δ + 1 N QEC = M Nrep for i in Range(N rep ) do θ i = QEC1(|ψ⟩ , U p,λ , G c , N QEC , δ) ρ c [i] = sin 2 ( θi 2 ) end for ρ c = median( ρ c ) lowerConfBound = ρ c -7 N QEC if lowerConf Bound ≥ 1 2 then return c, CertifiedRadius(lowerConfBound) else return -1, ABSTAIN end if end procedure 4.1 QUANTUM COMPUTING BOUNDS Given that m ancilla qubits are used for estimation, maximum number of oracle calls can be M = Σ m k=1 2 k = 2 m+1 -1 The probability that we measure θ correctly upto m bits is 8 π 2 , and the measured probability value ρ c is such that ∥ρ c -ρ c ∥ ≤ ϵ 0 = 2π ρ c (1 -ρ c ) M + π 2 M 2 More generally, for a single experiment with no repetitions, the error in measurement of ρ c with a confidence 1 -δ (using theorem 12 from Brassard et al. ( 2002)) is ∥ρ c -ρ c ∥ ≤ ϵ = ϵ 0 + 1 δ ( ρ c (1 -ρ c ) 2M + π 2 4M 2 ) + 1 δ 2 π 2 4M 2 Solving Eq. 17 for M in terms of ϵ, δ gives M ≈ π ρ c (1 -ρ c ) δϵ ∼ O( 1 δϵ ) The success probability 8 π 2 in Eq. 16 can quickly be boosted to close to 100% by repeating the experiment multiple times and using the median estimate of ρ c (Miyamoto, 2022) . As a result, M ∼ O( 1 ϵ log( 1 δ )) Constants for calculating M in Eq. 19 are small, as shown in Algorithm 11, and N rep and N QEC are based on Theorem 2 in Miyamoto (2022), also derived from Theorem 12 in Brassard et al. (2002) .

4.2. CLASSICAL COMPUTING BOUNDS

There are numerous intervals and bounds commonly used for statistical guarantees, that result in a similar order of magnitude in calculating M in terms of ϵ for a confidence 1 -δ. If we use Chernoff Bounds P(∥X -µ∥ ≤ δµ) ≥ 1 -2e -µδ 2 3 (20) ie for M repetitions of the function f classically, P(∥ρ c -ρ c ∥ ≤ ϵ) ≥ 1 -2e -M ϵ 2 3ρc (21) For the same setting, using Clopper-Pearson interval is a more popular practice since it provides tighter bounds upto O(M -3 2 ) (Thulin, 2014) . ϵ ≤ M -1 2 z δ 2 ρ(1 -ρ) + M -1 (22) For confidence 1 -δ, ignoring 1 M for large M , ∥ρ c -ρ c ∥ ≤ M -1 2 z δ 2 ρ(1 -ρ) (23) Hence M ≈ z 2 δ 2 ρ(1 -ρ) ϵ 2 ∼ O( 1 ϵ 2 ) (24) 5 CIRCUITS FOR SMOOTHING DISTRIBUTIONS p, λ Table 3 shows best performing smoothing distributions for each common adversary (Yang et al., 2020; Lee et al., 2019) . As shown in Yang et al. (2020) , the same distribution can be used to counter multiple adversaries. These are a subset of all possible (ϕ, l p ) pairs, for which we present the state preparation circuits. Please refer to other works (Yang et al., 2020; Lee et al., 2019; Cohen et al., 2019) for detailed discussion on optimal distributions. Density in Table 3 should match the probability distribution obtained from state preparation U p,λ . Appendix C discusses this topic in detail.  l 0 Tight Certificate x == α ? λ 2 v : 2 v -λ 2 v (2 v -1) argmax r ρ -1 r (0.5) ≥ p l 1 Uniform l ∞ ∝ I(∥x∥ ∞ ≤ λ) 2λ(ρ c -1 2 ) l 2 Normal Distribution ∝ e -∥ x λ ∥ 2 2 /2 λGaussianCDF -1 (ρ; 0, 1) U 0,λ j |x j ⟩ = λ 2 v |x j ⟩ + Σ β̸ =xj 2 v -λ 2 v (2 v -1) |β⟩ ( ) α 0 : |0⟩ H ⊗v IL α 1 : |0⟩ α 2 : |0⟩ α 3 : |0⟩ a : |0⟩ R y (θ) • X • H Figure 3: U p,λ j circuit for l 0 norm adversary. U 0,λ j smooths a state |x j ⟩ into a quantum state with amplitude λ 2 v for |x j ⟩ and an equal superposition of the rest of the space. U 0,λ j requires one additional ancilla qubit a that cannot be disentangled.

5.2. l 1 NORM

For l 1 norm adversary, we use a uniform distribution with width λ, centred at the pixel value α. To create a superposition state in a Uniform distribution, if λ = 2 k , k ∈ Z, then the Hadamard operator H on k-qubits H ⊗k can be used. The circuit in Figure 3 can also be adapted to prepare a uniform distribution by setting θ = π 2 . More generally, QF T (λ) can prepare the requisite uniform distribution for any λ. The prepared uniform distribution can then be shifted to its corresponding mean, resulting in U p,λ as shown in Figure 4 .

5.3. l 2 NORM

Creating Gaussian distribution (and any other log-normal distribution) is a well-studied problem (Grover & Rudolph, 2002) . The relevant circuit is described in Figure 9 in the Appendix C. The same effect can also be achieved using the SHIFT(α, w) operator, as shown in Figure 4 .

5.4. OTHER ADVERSARIES

Any log-concave distribution can be created as a qubit state with a quantum circuit (Grover & Rudolph, 2002) , and the SHIFT operator in Figure 8 to shift the distribution to a mean value. Yang et al. (2020) uses log-concave distributions for randomized smoothing, for instance, Laplace or Uniform distribution for l ∞ norm adversary. Refer to Table A .1 in Yang et al. (2020) for a comprehensive list of viable distributions and corresponding certified radii. α 0 : |0⟩ Distribution SHIFT(α, w) α 1 : |0⟩ α 2 : |0⟩ α 3 : |0⟩ Figure 4: U p,λ circuit for l p norm.

6. EXPERIMENTS

Due to a lack of access to QC hardware at the time of writing, all the presented results are obtained using simulation. For the base classifier, we train a 2-layer Deep Neural Network (DNN) model for recognizing handwritten digits using the MNIST dataset from Torch v1.11.0. First, a fully connected layer is followed by leakyRelu activation, normalization, and scaling to 4-bit integer representation. This discretized classifier has an accuracy ≈ 89.8% on the MNIST test dataset. We found that Projected Gradient Descent (PGD) adversary (Madry et al., 2017) can successfully attack this classifier, and this attack can be defended against using Randomized Smoothing (Cohen et al., 2019) . Detailed model architecture can be found in Appendix B. All training and simulation used an Nvidia RTX 2080 Ti GPU. To compare QuAdRo with the classical randomized smoothing algorithm, we reduce the problem size to simulate the QC circuit efficiently. First, we use the feature vector after layer 1 of the classifier as input that the adversary can attack. Second, we use a small layer 2 -d = 5. Third, we discretize feature vectors to 4-bit integer values after training -v = 4. This limits the input space to 20 bits, i.e. 2 20 possible inputs, which can be completely simulated classically. Additional 6-12 ancilla qubits are required for the QEC. Based on preliminary experiments, the simulation cost of QEC-based prediction is high, so a classical predictor with N predict = 100 is used in all experiments. Complete MNIST test set is used for experiments in Table 2 , while simulation plots in Figure 5 use subsamples of size 1000 for 5a, b and 100 for 5c. The experiments are designed around four parameters -λ adversary, N QEC calls to the classifier oracle, 1 -δ confidence, and 2 * w + 1 width of the truncated smoothing distribution. It is evident from Figure 5a that increasing λ reduces certified accuracy as expected while a smaller λ represents a weaker adversary. The effect of change in width w on ρ c in Figure 5b is not very pronounced. We observed that w < λ cannot model a λ adversary for radius > w * d. Since w linearly increases simulation cost, λ ∈ [2, 4], w ∈ [4, 5] are optimal for our experiments. Table 2 shows the results of our experiments. Based on the theoretical discussion (Thulin, 2014; Montanaro, 2015) , given the number of calls to the classifier oracle M = N certif y , we can use N certif y samples for classical certification while QuAdRo uses N rep iterations of size N QEC . Given ϵ q ≈ 7 N QEC , ϵ c ≈ ρc(1-ρc) N certif y , we should see a quantum advantage (ϵ q < ϵ c ) for N QEC > 49 * 12 log 1 δ + 1 z 2 δ 2 ρ c (1 -ρ c ) QuAdRo's accuracy outpaces the classical algorithm as N QEC ), i.e. N certif y increases, especially for contentious examples since they have a low certified radius. For median ρ c = 0.8 from Table 2 , the results for the two algorithms match around N QEC ≈ 2000 for δ = 0.001, which is observed in Figure 5c . To match the certified radius for ρ c ≈ 1.0, N QEC must scale by ≈ 1 1-ρc . The simulations accurately depict the functioning of QuAdRo, except making 2 dv calls to the classifier to create the superposition state instead of dv theoretically needed by the QC equivalent. In addition, floating point errors in the qubit state amplitude have been mitigated through normalization. All the theoretical bounds should remain the same for simulation results as well.

7. CONCLUSION

We developed the algorithm QuAdRo that creates a robust classifier and provides a certified radius for any base QML classifier. Section 4 establishes that the QuAdRo offers a quadratic speedup over existing classical algorithms for certified adversarial robustness via randomized smoothing. The same guarantees were simulated and tested in Section 6. The results hold for any distribution and classifier that can be implemented on a Quantum Computing device, and the QC circuit to prepare popular smoothing distributions as qubit states are presented in Section 5 and Appendix C. We have not made any additional assumptions on the base classifier; hence any guarantees that hold for prior art (Cohen et al., 2019; Yang et al., 2020) should hold for QuAdRo as well. Based on experiments, QuAdRo abstains less frequently and is more reliable than the robust classical classifier when N certif y is increased. Broader Impact and Limitations: The robustness guarantees in this paper apply only to the adversarial attacks on a classifier. The robust classifier will also inherit the underlying bias (e.g. training data, misrepresentation of objects/people, etc.) in the base classifier. The algorithm presented here also depends on a functional QC hardware for the speedup, and application to meaningful image inputs will need >1000 qubits. A NOTATION AND SCHEMATIC x 0 : |0⟩ H ⊗v IL x 1 : |0⟩ x 2 : |0⟩ x 3 : |0⟩ a : |0⟩ R y (θ) • X • H Figure 7: U p,λ circuit for l 0 norm adversary. Assuming V elements in the input space and adversary λ such that V > λ ≥ 1, the operator R y (θ) rotates the ancilla state a, where cos( θ 2 ) = λ-1 Since P(a = 1) > 1 2 , estimated number of repetitions of the state preparation circuit is less than 2. Please note that state preparation is also a part of Grover's Diffusion Operator, and will concentrate ancilla state |1⟩ after repeated applications. In general, measuring the output of QEC circuit multiple (<2) times until a = 1 yields the required state. We expect this alternate circuit to be more useful as a component in a variational circuit. To get the correct result, we need to perform the experiment multiple times, until ancilla state is measured as |1⟩. The expected number of trials <= 2. In addition, ancilla with the discarded state |0⟩ has an error ∼ O( 1V ) in the amplitude of |x⟩ which may be acceptable for some λ, V , and M . C.4 SHIFT OPERATOR Any prepared qubit state for a distribution can be shifted to a new mean using a quantum shift operator. The shift operator circuit SHIF T α based on QADD (Koch et al., 2022) shown in Figure 8 can take any superposition as input and shifts each |i⟩ to |i + α⟩. We believe the same can be implemented using other quantum adders/circuits. (Grover & Rudolph, 2002) . The same effect can also be achieved using the SHIFT(α, w) operator, as shown in Adversarial training of a VQC has been shown previously in Lu et al. (2020) , where they extend classical methods of adversarial training like Projected Gradient Descent(PGD) to QML. To carry out adversarial training of a smoothed VQC, the training objective defined above can be optimised using the parameter shift rules as explained in Schuld et al. (2019) .



allows searching for x in O( √ N ) calls to O. Using state preparation subroutine U such that U |0⟩ = |ψ⟩, the Grover diffusion operator G = (2 * |ψ⟩ ⟨ψ| -

Diffusion Operator Gc circuit.

Figure 2: Building blocks of QEC.

3 and 2.4, we design a robust QML classifier from any base binary classifier f c for class c provided by a unitary quantum parallel (Nielsen & Chuang, 2002) oracle QN N c . If output of the QML classifier for input x is class c then f c (x) = 1 else f c (x) = 0. Definition 4 QML classifier oracle QN N c QN N c |x⟩ |y⟩ = |x⟩ y f c (x) (9)

The state preparation U 0,λ j in Figure3requires only O(v) elementary gates (≈ 2v), 1 to prepare the state (proof in Appendix C).

Figure 5: Certified accuracy vs radius for parameters -a (left) Adversary λ for Normal distribution; b (middle) Smoothing distribution truncation parameter w ; c (right) Oracle calls -nc : N certif y for classical algorithm; nq : N QEC for QuAdRo;

Figure 6: Flow diagram describing QuAdRo.

Figure 8: SHIFT(α, w) circuit to shift the input distribution |p⟩ to mean α.

Figure 9: U p,λ (Input) implementation for l 2 norm(Grover & Rudolph, 2002).

Figure 4. D QUANTUM MACHINE LEARNING CLASSIFIERS There are multiple formulations that introduce variational quantum classification (VQC) circuits Cong et al. (2019); Farhi & Neven (2018); Havlíček et al. (2019); Schuld & Killoran (2019). A The training using adversarial examples of the smoothed soft classifer (SmoothAdv) improves the performance, which makes this another possible candidate for QN N c .

Adversary norm parameters and corresponding certified robustness radius.

Results : Accuracy of robust classifiers for MNIST test dataset -abstain (Abs) when ρ c < 0.5. Acc (!Abs) refers to the accuracy for certified inputs where classifier does not abstain. Abs % Acc (!Abs) % ρ c (y = y pred ) ρ c (y ̸ = y pred )

Notation used in the paper

B SIMULATION : DISCUSSION, PLOTS, RESULTS

To be able to simulate QuAdRo on a classical computer, we reduced the problem size by following three changes -first, we used feature vector after layer 1 of the model as input that adversary can attack. Second, we use a small layer 2. Third, we discretize feature vectors to 4 bit int values after training. This allows us to limit the total input space to 20 bits, ie a space of 2 20 possible inputs, which can be completely simulated.Even for the smaller problem size, we show that • The input space is local i.e. the nearby points in a neighborhood are likely to belong to the same class • A non-trivial classical classifier C can be trained on the dataset, achieving ≈ 90% accuracy • An adversary A (PGD) can easily find inputs that are mis-classified by C • Classifier based on randomized smoothing with l p norm and parameter λ is effective against A's attack.We created the quantum algorithm simulator in numpy, using output of the C over the complete input space of size 2 20 to calculate the superposition of states and exact ρ c for comparison.

B.1 MODEL ARCHITECTURE

First, we train a 2-layer DNN model -trained from scratch : 2 layer DNN, Layer 1 of size 784*5, Layer 2 of size 5*10, using LeakyRelu activation with a scaling factor 0.03. Layer 2 inputs are normalized, and scaled by 2 4 . Base classifier C inherits trained Layer2 from model, and inputs are the feature vector output by Layer 1 in the previously trained model, discretized to 4 bit integer representation in range [0, 16] . Leaky Relu activation with a scaling factor 0.03 is used after fc1, to make feature vectors partially invertible and can be used to visualize images for features outside the train and test datasets.

B.2 SIMULATION SETUP

We simulate QuAdRo for a 20 qubit input, with a total of 26-32 simulated qubits, since simulating a much larger system is not feasible on a classical computer. Due to lack of access to a quantum computer at the time of writing, the presented results were all simulated.Given this setup, we implement both the classical and quantum computing algorithms for randomized smoothing to find the certified radius. We simulate Normal Distribution since it can defend against l 1 , l 2 , and l ∞ adversaries (Yang et al., 2020) . We show that for same number of calls to the classifier, QuAdRo error bounds improves and surpasses the classical algorithm for same hyperparameters and input.The simulator implementation is non-trivial. We recommend reading the code (in supplementary material) and comments in addition to Appendix B. Following is a list of noteworthy features of the simulator• Calculates MNIST classifier output for the complete input space. This is used to simulate QN N c oracle and calculate the exact value of ρ c and corresponding certified radius. • Smoothing state ψ is prepared using discretized Normal Distribution defined byfor pixel value mu, sd = λ.• The QFT operator on ancilla and the following Grover operator applied to ψ have been simplified into an iterative operation on initial state. The iteration has N QEC steps.

B.3 CODE SNIPPETS

We will release the code in supplementary material with its license. The code snippets here are only for reference, and we recommend using the code from supplementary material for simulation. The following two functions can be used to simulate the Grover's Search operation by first preparing the requisite ψ, then repeating the search operation, or using it in a subroutine for amplitude estimation. Classical data can be encoded into a quantum state in numerous ways. Amplitude encoding is one of the most popular methods -a vector of real/complex values is encoded as the amplitude of the N states in a superposition such that the amplitude of |i⟩ corresponds to the i th value in the input vector.Variations of this scheme have been used for different algorithms (Zhang et al., 2013; Le et al., 2011) .Oracle output for any input is pixel state dependent and any smoothing operation will destroy that information if pixel state was amplitude encoded -this is equivalent to running the algorithm for the mean of all images in the smoothing neighborhood for the given distribution.C.2 l 0 NORMThe circuit in Figure 7 is optimized for low circuit depth and gate count. General state of this system will be referred to as |ψ n ⟩ = |x⟩ |a⟩, where |x⟩ represents the qubits that will eventually be in a superposition state of the smoothing distribution, while |a⟩ is the additional qubit needed in the circuit. Subscript n represents n th logical step in the circuit.Initially, all qubits are set to the ground state |0⟩, hence the state of the system isHenceparameterized quantum classification operation V (; γ) on input |ψ⟩ is defined asWhere V (; γ) is a multilayer classifier which can be defined asThe quantum classifiers in these works consist of a sequence of learnable, parameter-dependent unitary transformations V i (; γ i ) followed by a measurement operation of selected qubits that outputs the class. V is trained so that | i | 2 corresponds to the probability of input |ψ⟩ belonging to class i.Theoretically, the output of the system on input |ψ⟩ is given by ⟨ψ| V † MV |ψ⟩ , where M is the measured observable at the end. For instance, measurement of output qubits in σ z basis can be used to estimate the predicted class. Experimentally, because of the probabilistic nature of the measurement, a high number of repetitions/shots need to be carried out in order to estimate this output distribution and predict the correct class.Using a parameterized VQC as an oracle( c ) within QuAdRo is non-trivial. Commonly used methods like Mid-Circuit Measurement and Reset (MCMR) or any other non-unitary operation before the end of QEC cannot be used. This restricts our choice to variationally trained classifiers that do not perform MCMR. 10 . This construction increases the circuit depth and requires more oracle calls. On training, the multi-class classifier reached 74% accuracy for the same input features as used in Sec 6. We found that such a QN N c oracle is feasible, but simulation using existing libraries and available computational resources is prohibitively expensive. Hence, this oracle is not used in the presented results in this paper. We discuss this further in the supplementary material.In another training approach, Salman et al. ( 2019) consider a generalization of randomized smoothing to soft classifiers F , which output a probability distribution over all classes for a given input, as compared to a hard classifier that outputs a single class. The adversarial training objective J in this case is given by(Equation 5 from Salman et al. ( 2019)), J(x) = -log ε∼N (0,λ 2 I) E[F (x + ε) y ] (39)

