SINGLE-PHOTON IMAGE CLASSIFICATION

Abstract

Quantum Computing based Machine Learning mainly focuses on quantum computing hardware that is experimentally challenging to realize due to requiring quantum gates that operate at very low temperature. We demonstrate the existence of a "quantum computing toy model" that illustrates key aspects of quantum information processing while being experimentally accessible with room temperature optics. Pondering the question of the theoretical classification accuracy performance limit for MNIST (respectively "Fashion-MNIST") classifiers, subject to the constraint that a decision has to be made after detection of the very first photon that passed through an image-filter, we show that a machine learning system that is permitted to use quantum interference on the photon's state can substantially outperform any machine learning system that can not. Specifically, we prove that a "classical" MNIST (respectively "Fashion-MNIST") classifier cannot achieve an accuracy of better than 22.96% (respectively 21.38% for "Fashion-MNIST") if it must make a decision after seeing a single photon falling on one of the 28 × 28 image pixels of a detector array. We further demonstrate that a classifier that is permitted to employ quantum interference by optically transforming the photon state prior to detection can achieve a classification accuracy of at least 41.27% for MNIST (respectively 36.14% for "Fashion-MNIST"). We show in detail how to train the corresponding quantum state transformation with TensorFlow and also explain how this example can serve as a teaching tool for the measurement process in quantum mechanics.

1. INTRODUCTION

Both quantum mechanics and machine learning play a major role in modern technology, and the emerging field of AI applications of quantum computing may well enable major breakthroughs across many scientific disciplines. Yet, as the majority of current machine learning practitioners do not have a thorough understanding of quantum mechanics, while the majority of quantum physicists only have an equally limited understanding of machine learning, it is interesting to look for "Rosetta Stone" problems where simple and widely understood machine learning ideas meet simple and widely understood quantum mechanics ideas. It is the intent of this article to present a setting in which textbook quantum mechanics sheds a new light on a textbook machine learning problem, and vice versa, conceptually somewhat along the lines of Google's TensorFlow Playground (Smilkov et al. (2017) ,) which was introduced as a teaching device to illustrate key concepts from Deep Learning to a wider audience. Specifically, we want to consider the question what the maximal achievable accuracy on common one-out-of-many image classification tasks is if one must make a decision after the detection of the very first quantum of light (i.e. photon) that passed a filter showing an example image from the test set. In this setting, we do not have a one-to-one correspondence between example images from the training (respectively test) set and classification problems. Instead, every example image defines a probability distribution for the (x, y) detector pixel location on which the first photon passing an image filter lands, the per-pixel probability being the pixel's brightness relative to the accumulated (across all pixels) image brightness. So, from every (28 × 28 pixels) example image, we can sample arbitrarily many photon-detection-event classifier examples, where the features are a pair of integer pixel coordinates, and the label is the digit class. On the MNIST handwritten digit dataset (LeCun and Cortes ( 2010)), any machine learning system that only gets to see a single such "photon detected at coordinates (x, y)" event as its input features, of the pixel that flashed up are the only input features, is limited in accuracy by the maximum likelihood estimate, since we have: P (Image class C|Photon detected at (x, y)) = E P (Image class C|Example E)P (Example E|Photon detected at (x, y)). On photon detection events generated each by first randomly picking an example image, and then randomly picking a brightness-weighted pixel from that, we cannot do any better than predicting the most likely digit class given these input features -the two pixel coordinates. As performance is measured on the test set, no classifier could possibly ever outperform one that is built to achieve maximal performance on the test set. This is obtained by determining, for each pixel, what the most likely class is, where examples from the test set are weighted by the fraction of total example-image brightness that comes from the pixel in question. Figure 2(b) shows the most likely image-class per pixel. (For MNIST, some pixels are dark in every test set example.) No classifier can outperform one that simply looks up the pixel-coordinates at which a photon was detected in Figure 2 (b) and returns the corresponding class, and this optimal classifier's accuracy is 22.96% for the the MNIST dataset -substantially higher than random guessing (10%). Appendix A.2 provides a detailed (but mostly straightforward) optimality proof of this accuracy threshold. We cannot, for example, outperform it by redistributing light intensity between pixels, since any such redistribution could only destroy some of the available useful information, not magically create extra useful information. An entirely different situation arises when we allow quantum mechanics to enter the stage: For a single photon passing through a coherently illuminated image filter, with all pixels at the same optical phase on the incoming wave, we can imagine putting some precision optical device between the image filter and the detector array that redistributes not the probabilities (which correspond to light intensity when aggregating over many photons), but the amplitudes that make up the spatial part of the photon wave-function. Illuminating such a set-up with many photons would show a hologram-like interference pattern on the detector array. This transformation of the (single-)photon wave function by linear optical elements then has tuneable parameters which we can adjust to improve classifier accuracy. Quantum mechanics tells us that every (lossless) linear optical device can be represented by a linear unitary transform on the photon state: The action of any complex optical device consisting of (potentially very many) components which transforms a N -component photon state (in our case, N = 28 2 amplitudes in the spatial part of the photon wavefunction) can be described by an element of the N 2 -dimensional unitary matrix Lie group U (N ). Vice versa, Reck et al. (1994) describes a constructive algorithm by which any U (N ) transformation matrix can be translated back to a network of optical beam splitters and phase shifters.

1.1. RELATED WORK

Conceptually, exploiting interference to enhance the probability of a quantum experiment producing the sought outcome is the essential idea underlying all quantum computing. The main difference between this problem and modern quantum computing is that the latter tries to perform calculations by manipulating quantum states of multiple "entangled" constituents, typically coupled two-state quantum systems called "qubits," via "quantum gates" that are controlled by parts of the total quantum system's quantum state. Building a many-qubit quantum computer hence requires delicate control over the interactions between constituent qubits. This usually requires eliminating thermal noise by going to millikelvin temperatures. For the problem studied here, the quantum state can be transformed with conventional optics at room temperature: the energy of a green photon is 2.5 eV, way above the typical room temperature thermal radiation energy of kT 25 meV. The price to pay is that it is challenging to build a device that allows multiple photons to interact in the way needed to build a many-qubit quantum computer. Nevertheless, Knill, Laflamme, and Milburn (Knill et al. (2001) ) devised a protocol to make this feasible in principle, avoiding the need for coherency-preserving nonlinear optics (which may well be impossible to realize experimentally) by clever exploitation of ancillary photon qubits, boson statistics, and the measurement process. In all such applications, the basic idea is to employ coherent multiphoton quantum states to do computations with multiple qubits. In the problem studied here, there is only a single photon, the only relevant information that gets processed is encoded in the spatial part of its wave function (i.e. polarization is irrelevant), so the current work resembles the "optical simulation of quantum logic" proposed by Cerf et al. (1998) where a N-qubit system is represented by 2 N spatial modes of a single photon. Related work studied similar "optical simulations of quantum computing" for implementing various algorithms, in particular (small) integer factorization (Clauser and Dowling (1996); Summhammer (1997) ), but to the best of the present authors' knowledge did not consider machine learning problems.

