QUANTUM 3D GRAPH STRUCTURE LEARNING WITH APPLICATIONS TO MOLECULE COMPUTING

Abstract

Graph representation learning has been extensively studied over the last decade, and recent models start to pay attention to a relatively new area i.e. 3D graph learning with 3D spatial position as well as node attributes. Despite the progress, the ability to understand the physical meaning of the 3D topology information is still a bottleneck for existing models. On the other hand, quantum computing is known to be a promising direction for its theoretically verified supremacy for large-scale graph and combinatorial problem as well as the increasing evidence for the availability to physical quantum devices in the near term. In this paper, for the first time to our best knowledge, we propose a quantum 3D embedding ansatz that learns the latent representation of 3D structures from the Hilbert space composed of the Bloch sphere of each qubit. Specifically, the 3D Cartesian coordinates of nodes are converted into rotation and torsion angles and then encode them into the form of qubits. Moreover, Parameterized Quantum Circuit (PQC) is applied to serve as the trainable layers and the output of the PQC is adopted as the final node embedding. Experimental results on two downstream tasks, molecular property prediction and 3D molecular geometries generation, demonstrate the effectiveness of our model. Though the results are still restricted by the computational power on the classic machine, we have shown the capability of our model with very few parameters and the potential to execute on a real quantum device.

1. INTRODUCTION

Graph representation, or specifically 3D graph representation as considered in this paper, has received extensive attention over the last decade. Beyond tasks like node classification or link prediction, it further facilitates various downstream applications such as molecular property prediction (Liu et al., 2021) and drug design (Gaudelet et al., 2021) . Recently, machine learning approaches have been well developed for learning latent node embedding on molecules (Schütt et al., 2017;  Unke & Meuwly, 2019; Gasteiger et al., 2019; 2021) . However, the mainstream of such researches is still facing the challenges of better processing the 3D Cartesian coordinates and learning the latent representation of the 3D graph structure. On the other hand, there are also emerging lines of researches in the area of quantum computing. State-of-the-art quantum computing hardwares are now stepping into the Noisy Intermediate-Scale Quantum (NISQ) era, which leads to the possibility to implement applications in specific scientific domains in the near term (Preskill, 2018; Arute et al., 2019; Zhong et al., 2020; Huang et al., 2020) . The overlap between quantum computing and machine learning has emerged as one of the most encouraging areas for quantum computing, as termed by quantum machine learning (Biamonte et al., 2017) . Quantum paradigms or hybrid paradigms have been carefully designed to fulfill quantum supremacy in quantum chemistry problems (Aspuru-Guzik et al., 2005; O'Malley et al., 2016) . Existing approaches mainly focus on the quantum simulation of molecular energies, which enables effective prediction of chemical reaction rates. However, these quantum approaches (Romero et In this paper, we aim to develop quantum machine learning approaches to learn the latent representation of the 3D graph structure of molecules instead of directly simulating the molecular energies with Hamiltonians. Graph learning may not be as precise as molecular simulation approaches for property prediction, but they have the ability to learn hundreds or thousands of molecules and predict the properties for more complex molecules. Specifically, we first convert the 3D Cartesian coordinates of the atoms into three geometries: distance, rotation angle, and torsion angle. Then we encode the angles and distance as well as the atom type (a discrete variable), into qubits. A distance threshold is used so that each time a focal atom is picked to learn the embedding, one only need to consider the neighboring atoms within the threshold. Considering the size of the molecules and the size of the neighborhood, we only require up to ten qubits to learn the representation, which makes our proposed model easy to simulate on a classical processor and capable of running on a NISQ device. Analog to the hardware efficient ansatz (Kandala et al., 2017; Huang et al., 2021) , we apply a Parameterized Quantum Circuit (PQC) after the encoding stage. The trainable parameters are the θs of the rotation gates R x and R y in the PQC. The gradient of each parameter θ is calculated by the shifting technique (Mitarai et al., 2018) , and those parameters are updated by the backpropagation and gradient descent approach analog to classical neural networks. We apply a tomography at the end of the circuit and concatenate the real part and imaginary part of the output vector and then take it as the node embedding. We conducted numerical experiments on the filtered QM9 dataset for both molecular property prediction task and molecular geometries generation task. Experimental results show that compared with classical state-of-the-art baseline models, our quantum 3D embedding model achieves comparable results on small datasets with much fewer network parameters. We summarize our contributions as follows: 1) To the best of our knowledge, we are the first to use qubits to encode 3D relative positional information, which aims to effectively preserve the property of equivariance and invariance. In fact, using a qubit on a Bloch sphere to encode the rotation and torsion angle of two atoms is more intuitive than using 3D Cartesian coordinates, which is also supported by the success of spherical representation on not only in molecules but also point clouds in recent studies. 2) We use two qubits to represent each atom, and we only consider the focal atom and its neighbors at each iteration. Therefore, the maximum number of qubits is 10 in our model. So we are able to test our model on Qiskit (http://qiskit.org) with quantum cloud service from IBM-Q with simulator yet it guarantees that the code can also be seamlessly deployed and runnable on IBM's NISQ device. 3) We manage to implement a quantum circuit full-amplitude simulator with transition unitary for the PQC on a classical processor. It replicates the results yet over 20 times faster than the QASM simulator from IBM Qiskit's simulator, which enables us to conduct experiments on more tasks.

4)

The numerical experiments on two different well-studied molecular tasks show that our embedding approach is able to extract geometry and neighborhood information with very few parameters (only 64 parameters in the PQC) and achieve relatively good results.

2. PRELIMINARIES AND RELATED WORKS

In this section, we first briefly review basic concepts of quantum computing as well as quantum machine learning. We further present some previous works on quantum graph learning approaches.

2.1. QUANTUM COMPUTING

In quantum computing, qubit (abbreviation of quantum bit) is a key concept which is similar to a classical bit with a binary state. The two possible states for a qubit are the state |0 and |1 , which correspond to the state 0 and 1 for a classical bit respectively. We refer the readers to the textbook (Nielsen & Chuang, 2002) for comprehension of quantum information and quantum computing. In this paper, we give a compact description of background for self-containess. A quantum state is commonly denoted in bracket notation. It is also common to form a linear combinations of states, which we call a superposition: |ψ = α|0 + β|1 . Formally, a quantum system on n qubits is an n-fold tensor product Hilbert space H = (C 2 ) ⊗d with dimension 2 d . For any |ψ ∈ H, the conjugate transpose ψ| = |ψ † . The inner product ψ|ψ = ||ψ|| 2 2 denotes the square of the 2-norm of ψ. The outer product |ψ ψ| is a rank 2 tensor. Computational basis states are given by |0 = (1, 0), and |1 = (0, 1). The composite basis states are defined by e.g. |01 = |0 ⊗ |1 = (0, 1, 0, 0). Analog to a classical computer, a quantum computer is built from a quantum circuit containing wires and elementary quantum gates to carry around and manipulate the quantum information. A quantum gate is a unitary operation U on Hilbert space H. When we simulate the quantum circuit on a classical computer, we can obtain the overall transition unitary by tensoring and multiplying those unitary gate operators together. A projective measurement is described by an observable, M , a Hermitian operator on the state space of the system being observed. The observable has a spectral decomposition, M = m mP m , where P m is the projector onto the eigenspace of M with eigenvalue m. When measuring the state |ψ , the probability of getting results m is given by p(m) = ψ|P m |ψ . 2.2 QUANTUM MACHINE LEARNING (Cerezo et al., 2021) proposed the concept of Variational Quantum Algorithms (VQA), which leverages quantum advantages to solve machine learning problems on a near-term quantum device. Then, Parameterized Quantum Circuits (PQC) are the concrete implementation of certain VQA. For each qubit we have rotation operator R x (θ) which rotate through angle θ (radias) around the x-axis. A PQC is mainly composed of R x (θ), R y (θ) and R z (θ) with θ as the parameters. The parameters θ are updated by a classical optimizer to minimize the loss function L(θ) which evaluates the dissimilarity between the output of PQC and the target result. The derivative of the i-th parameter θ(i) can be computed by using the shifting technique proposed by (Mitarai et al., 2018) . It requires running the whole circuit twice but with shifting θ(i) to θ(i) + π/2 and θ(i ) -π/2 ∂L(θ) ∂θ(i) = L(θ(1), • • • , θ(i) + π/2, • • • ) -L(θ(1), • • • , θ(i) -π/2, • • • ) 2 Also using gradient backpropagation, classical learning models are adapted into their quantum version, e.g. QCNN (Cong et al., 2019) , QRNN (Bausch, 2020) , QGAN (Huang et al., 2021) , QL-STM (Chen et al., 2022), and etc, which yet show that the quantum counterparts on NISQ device may not be as powerful as the SOTA classical ones (usually with millions of parameters). Involving quantum computing is an interesting experiment to seek potential supremacy and the connection between latent space and the mystery quantum entanglement.

2.3. UNITARY COUPLED-CLUSTER

One of the most promising area to demonstrate the quantum computing supremacy is quantum chemistry. 2007) . However, UCC is an unsupervised learning method with no ground truth and can only evolve one molecule at a time since the circuit is uniquely designed for a certain molecule. There are also evidence showing that the number of parameters in UCC might be still too large to allow practical calculations for large molecules.

2.4. QUANTUM GRAPH LEARNING

Different from evolving Hamiltonian and solving the Schrödinger equation with the quantum circuit, we also have quantum graph learning approaches trying to learn the latent representation of the vertex and the graph. A hierarchical architecture based on quantum random walks is employed to extract multi-scale properties of the graph (Dernbach et al., 2018) . However, it is vague that how to efficiently construct the diffusion matrix from the quantum states generated by the quantum walkers. The information aggregation is performed by the classical system, which further incurs additional expenses as a consequence of the interaction between quantum and classical environment. (Zhang et al., 2019) and (Ai et al., 2022) suggest to exploit the quantum Hilbert space to rebuild the quantum representation of the graph in the quantum state. But the number of qubits to represent a graph with 

3.1. PROBLEM SETTING AND METHOD OVERVIEW

Problem Setting. In this paper, we aim to develop a quantum machine learning approach for learning node embedding with node-wise 3D coordinates. We take molecules with 3D graph structures as an example. Let G denotes the graph of a certain molecule and V denotes the node set of graph G. The number of nodes (in other words atoms) is n = |V|. Each node v i ∈ V has an attribute a i , which is the atom type in our setting. Our target is to learn the embedding for each atom and then obtain the final embedding for the molecule. The embeddings are then tested on different molecular tasks (e.g. molecular property prediction, 3D molecular geometries generation, etc.). Method Overview. We develop a quantum machine learning approach to learn the embedding on 3D graph. The trainable parameter refers to the θ in those rotation gates in the PQC. Specifically, we first encode the 3D coordinates and the atom types into qubits. We use relative coordinates instead of the 3D Cartesian coordinates to ensure both equivariance and invariance. The relative coordinates can be written in the form of a position tuple (d, θ, ϕ), where d, θ and ϕ denote the radial distance, polar angle, and the azimuthal angle, respectively. We set up a distance threshold to pick the neighbors which can interact with the focal atom. A PQC is then used to learn the latent variables and entangle the qubits together. We further apply a tomography at the end of the PQC and then concatenate the real part and the imaginary part. The overall pipeline is shown in Fig. 1 .

3.2. THE PROPOSED ATOM2QUBIT

Considering a molecule with n atoms, we take it mathematically as a graph G with n nodes. For each node v i , we have a corresponding attribute a i , which denotes the atom type and a 3D Cartesian coordinate set {x i , y i , z i }. Without loss of generality, we first pick v i as the focal atom and learn the embedding of node v i . The distance between v i and other nodes v j ∈ V is d ij = (x j -x i ) 2 + (y j -y i ) 2 + (z j -z i ) 2 . Note that not all of the node pairs in the graph have interaction in the pairs, we set a maximum distance threshold d max as a hyperparameter. So that v j ∈ N (v i ), if i = j and d ij ≤ d max , which means only the nodes v j with d ij ≤ d max are considered as the neighbors of v i . We then need to convert the 3D Cartesian coordinates of v j ∈ N (v i ) into the position tuple (d ij , θ ij , ϕ ij ). The definition of rotation angle θ and torsion angle ϕ are shown in Fig. 1 (b ). Now each node v j ∈ N (v i ) can be uniquely defined by {a j , d ij , θ ij , ϕ ij }. When we encode classical information into the quantum form, we have two different ways. The first one is amplitude encoding and the second one is angle encoding. The amplitude encoding can encode a classical one-hot vector of dimension n with only log 2 (n) qubits, but it is quite hard to encode continuous variables while requires O(n) times to encode the information. On the contrary, the angle encoding requires a minimum of n/3 qubits to encode n classical information, but it is capable of encoding both discrete and continuous variables. Furthermore, the angle encoding is a better fit for the rotation parameters in the circuit. In this paper, we pick angle encoding as our way to encode the information set {a j , d ij , θ ij , ϕ ij } into qubits. For each qubit, we have three rotation operators R x , R y and R z . We can theoretically encode three different pieces of information on one qubit. However, if we consider the qubit on a Bloch sphere, we can uniquely define the rotation track on the Bloch sphere using only two rotation operators. To avoid the decomposition of the third input, we only use two of the rotation operators R x and R y in this paper (R z does not change the outputs of our measurement method). Therefore, we need two qubits |Ψ 1 and |Ψ 2 to encode each node v j , |Ψ 1 = U x (θ ij ) × U y (ϕ ij ) × |0 |Ψ 2 = U x ( d ij d max × 2π) × U y ( a j a num × 2π) × |0 where a num denotes the number of atoms occurred in the dataset and a j is an integer ∈ [  |Ψ 0 = |Ψ 1 ⊗ |Ψ 2 ⊗ • • • ⊗ |Ψ 2n-1 ⊗ |Ψ 2n

3.3. QUANTUM 3D EMBEDDING ANSATZ

We first discuss the number of qubits we need for our approach on molecule problems. Each time we learn the embedding of node v i , we need to encode the information of v i ∪ N (v i ) into qubits. Therefore the qubit number is linear with the size of N (v i ). The interaction between atoms in a molecule is bounded by the bond length between atoms. As the bond length increases, the interaction becomes much weaker, which means we barely have multi-hop message passing in our graph. This gives us the possibility to run the test on an existing near-term quantum device. Therefore, we choose hardware-efficient ansatz that has been proved on a superconducting quantum processor with six fixed-frequency transmon qubits by (Kandala et al., 2017 ) and a 56-bit superconducting quantum processor Zuchongzhi by (Huang et al., 2021) . Analog to classical neural network models, the PQC is constructed by layers and each layer has an identical arrangement of quantum gates. Fig. 2 illustrates the general framework of the quantum 3D embedding ansatz. The overall unitary U (θ) = Π L l=1 (U ent U l (θ)), where U ent is the entanglement layer and U l (θ) is the l-th trainable layer. In particular, we have the l-th trainable layer U l (θ) = N k=1 (U y (θ (k,l) y )) × N k=1 (U x (θ (k,l) x )), where U x is the unitary of gate R x and θ (k,l) x is the parameter for R x at the l-th layer on the k-th qubit. The circuit for our quantum 3D embedding ansatz. Each layer includes trainable parameters block U l and entanglement block U ent . We have N qubits in the circuit so there are 2 × N parameters in each layer. The entanglement layer is composed of CNOT gates to pairwisely entangle all the N qubits. it entangles all the qubits together shown in Fig. 2 . The quantum state |Ψ l after l layers is |Ψ l = U ent × U l × (U ent × U l-1 × (• • • (U ent × U 1 |Ψ 0 ))) (5) = U ent × N k=1 (U y (θ (k,l) y )U x (θ (k,l) x )) × (• • • (U ent × N k=1 (U y (θ (k,1) y )U x (θ (k,1) x ))|Ψ 0 ))) (6) The quantum state |Ψ 0 is the initial state, which is also the output of the Atom2Qubit stage. With the parameters θ (k,l) x and θ (k,l) y , we can learn the latent representation of each node. Note that the model we proposed is a graph representation learning model, thus we need to further attach downstream tasks to test the efficiency of our model, and the loss function is also obtained from the downstream model. The loss function L which is employed to optimize the trainable parameters θ = θ (k,l) x θ (k,l) y , where is concatenation, for our model M it yields: min θ L(M θ (|Ψ 0 )) The parameters θ are then updated at each iteration by gradient decent from Eq. 1.

3.4. ATOM EMBEDDING

The quantum circuit we mentioned above is a N qubit circuit and it works in a 2 N dimensional Hilbert space. We apply a tomography at the end of the circuit thus we can get a 2 N dimensional vector with a complex number α i for each dimension. A quantum state |ψ can be written in the form of a combination of the computational basis states, |ψ = α 1 |0 • • • 00 2 N +α 2 |0 • • • 01 2 N +α 3 |0 • • • 10 2 N + • • • + α 2 N |1 • • • 11 2 N ( ) where α i , 1 ≤ i ≤ 2 N are the complex coefficients and the vector (α 1 , α 2 , • • • , α 2 N ) is the result of the tomography. Each α i can be written in the form of α i = Re(α i ) + Im(α i ) • i, where Re denotes the real part, Im denotes the imaginary part and i is the imaginary unit. We then concatenate the real part and the imaginary part of the tomography and get the node embedding vector (Re α 1 , Re α 2 , • • • , Re α 2 N , Im α 1 , Im α 2 , • • • , Im α 2 N ) with the dimension of 2 N +1 .

4. NUMERICAL EXPERIMENTS

All the experiments are performed on a single machine with 1TB memory, one physical CPU with 28 cores Intel(R) Xeon(R) W-3175X CPU @ 3.10GHz), and two GPUs (Nvidia Quadro RTX 8000). The source code is written by PyTorch, where we simulate the whole quantum circuit process using transition unitary. We have also implemented a Qiskit version of our modelOn QM9-pred, the average training time for each epoch is over 2 hours using QASM simulator on Qiskit and it takes an average of 310s on our simulator, which is about 23 times faster. Note that all our models are not implemented on quantum hardware yet, but the model and the circuit we proposed are easy to adapt to NISQ devices. To test the performance of our embedding model, we perform numerical experiments on two different tasks and compare the results with state-of-the-art classical 3D molecular representation learning models. Dataset. The benchmark dataset we used is QM9 (Ramakrishnan et al., 2014), which is widely used for predicting various properties of molecules and 3D molecules generating tasks. It includes quantum chemistry structures and properties of up to 134k stable small organic molecules. These molecules consists of up to 9 heavy atoms CONF, not counting hydrogen, and their corresponding 3D molecular geometries are computed by density functional theory (DFT).

4.1. IMPLEMENTATION DETAILS

In Sec. 3.2 we have shown how to convert the information of each atom into rotation angles on qubits. Now we discuss more precisely how to calculate the position tuple (d, θ, ϕ). The 3D coordinates for atom v i in QM9 is three real numbers x i , y i and z i . If we pick v i as the focal atom, ∀v j ∈ N (v i ), we need to calculate the position tuple for atom v j against v i . d ij = (x j -x i ) 2 + (y j -y i ) 2 + (z j -z i ) 2 θ ij = arctan( (x j -x i ) 2 + (z j -z i ) 2 y j -y i ) ϕ ij = arctan x j -x i z j -z i In order to fit the definition of rotation angle and torsion angle, those angles should fit into the domain θ ij ∈ [0, π] and ϕ ij ∈ [0, 2π). We need to adjust the results in Eq. 10 and Eq. 11. θ ij ← θ ij + π, if y j -y i < 0 (12) θ ij ← 0, if y j -y i = 0 (13) ϕ ij ← ϕ ij + π, if z j -z i < 0 ∧ (x j -x i < 0 ∨ x j -x i > 0) (14) ϕ ij ← ϕ ij + 2π, if z j -z i > 0 ∧ x j -x i < 0 Specifically, we set d max = 1.77, which is the maximum bond length in the dataset, and we set a num = 6 as there are five different types of atom in the dataset in addition with a null type. Now we discuss more details of the transition unitary based full-amplitude circuit simulation on a classical processor. The quantum gates we used in our PQC are only R x (θ), R y (θ) and CNOT. The matrix representations of the single-qubit gates are as follows: R x (θ) = cos( θ 2 ) -i sin( θ 2 ) -i sin( θ 2 ) cos( θ 2 ) , R y (θ) = cos( θ 2 ) -sin( θ 2 ) sin( θ 2 ) cos( θ 2 ) The two-qubit gate unitary matrix is as follows: CN OT =    1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0    With these matrices of the basic gates, we obtain the unitary of a circuit block. We first divide the block by layers, and each layer has at most one gate for each qubit. We tensor the gate matrix within each layer and thus we get the unitary for the layer. We then use matrix multiply between different layers, so the final unitary is calculated through torch.tensor() and then torch.matmul(). For a N qubit circuit, the overall unitary U ∈ C 2 N ×2 N . The maximum number of qubits we need is 10, so the largest unitary we need is in C 1024×1024 , which is still affordable on a classical processor. Table 2 : Performance comparison between the baselines and our proposed method on QM9-pred in terms of MAE for three properties ( HOMO , LUMO and ∆ ) and the std. MAE for all three properties. We use the unit eV for these three energy-related properties. We first conduct experiments on the task of molecular property prediction to evaluate our embedding model. The downstream model we used is a simple multilayer perceptron predictor, which can perform linear regression on the embeddings from the embedding model. Setting. We filter the QM9 to generate the dataset for our prediction task. Our quantum model suffers from the extremely high time cost of simulating the quantum circuit on classic computers, and it is impossible for us to run on the whole 134k molecules in QM9. We sieve the dataset with molecules no more than 10 atoms, and randomly pick 500 of them to form our dataset for the prediction task. We denote the dataset as QM-pred, and statistics of it are listed in Table 1 . We split the dataset into training/validation/test sets with a ratio of 8:1:1. Training molecules are used to optimize the model parameters. The validation molecules are used to fine-tune the hyper-parameters as well as conduct the early stopping, and then we report the results on test molecules. Among all sixteen properties listed in QM9, we selected three important energy-related properties, namely HOMO , LUMO and ∆ . ∆ , also known as the HOMO-LUMO gap, is one of the most practicallyrelevant quantum chemical properties of molecules (Bredas, 2014) . In line with (Liu et al., 2021) , we report the mean absolute error (MAE) for each property as well as the overall mean standardized MAE (std. MAE) for all these three properties. Baselines. To the best of our knowledge, there are no other quantum models considering representation learning for 3D graphs, thus we compare our method with four baselines in the classical domain: the seminal work in this area SchNet (Schütt et al., 2017) , DimeNet++ (Klicpera et al., 2020) , SphereNet (Liu et al., 2021) , ComENet (Wang et al., 2022) and EGNN (Satorras et al., 2021) . Prediction model. The embeddings obtained from our model are fed to a simple predictor, which is a multilayer perceptron reducing the size of the embedding from 2 N to 1. We use stochastic gradient descent (SGD) with Adam optimizer (Kingma & Ba, 2014) to train our model for a maximum of 100 epochs with a batch size of 32 and a learning rate of 0.01. Meanwhile, as the running time will increase dramatically when the number of trainable layers increases, we set the number of layers in the PQC as four to balance the training time and accuracy. Results. The results of the property prediction task are presented in This study evaluates the performance of our proposed embedding model when adapted to the existing random molecular geometry generation method. To be more specific, the embeddings from our model are used to extract 3D conditional information in the generation process. Setting. We also use filtered QM9 for evaluation. Different from QM-pred, we select 806 molecules that contain no more than 10 atoms to form our dataset, 50 of them are used for validation and the remaining are used for training. We entitled this filtered QM9 as QM9-gen and the statistics of QM9-gen is presented in Table 1 . The generated molecular geometries can be converted to molecular graphs according to the approach proposed in (Gebauer et al., 2019) . As for metrics, we use the chemical validity percentage (Validity) which is defined as the percentage of molecular graphs that obey the chemical valency rules to evaluate the generation accuracy. In addition, we adopt Maximum Mean Discrepancy (MMD) (Gretton et al., 2012) distances of bond length distributions to evaluate the 3D structural accuracy of the generated molecular geometries. We calculate the length distribution in the generated geometries and in the dataset geometries separately for each type of bond, then we can obtain the statistical discrepancy between them with the MMD distance. In line with (Luo & Ji, 2022) , we compute the MMD on hydrogen-carbon single bonds (H-C), hydrogennitrogen single bonds (H-N), hydrogen-oxygen single bonds (H-O), carbon-carbon single bonds (C-C), carbon-nitrogen single bonds (C-N), carbon-oxygen single bonds (C-O) these six types of chemical bonds respectively as they are most frequently appeared. Baseline. We use G-SphereNet (Luo & Ji, 2022) as the baseline in this molecular geometries generation task. We select G-SphereNet (also from ICLR 22) produced by the same group as SphereNet, which uses SphereNet as the embedding model to extract 3D conditional information. Generation Model. As for generation model, we employ the same generation pipeline as G-SphereNet, which adopts a flexible sequential generation strategy by adding atoms in 3D space one by one based on autoregressive flow models. We use Adam optimizer to train the our model for 100 epochs, with a batch size of 64 and a learning rate of 0.001. Also, we set the maximum number of atoms that can be generated for each molecule as 13. Results. We present the performance of our model against G-SphereNet in Table 3 . We reach comparable results with baseline model on QM-gen. More specifically, our model slightly outperforms the baseline model on MMD distances for 3 types of bond length, which shows that our method bears a strong capability of extracting the 3D conditional information of molecular geometries.

5. CONCLUSION

3D information is important for graphs such as molecules in quantum chemistry and learning the 3D representation for those graphs has attracted increasing attention. Existing classical models face the inherent challenge of understanding the physical meaning of the 3D Cartesian coordinates. To our best knowledge, we are the first to use qubits to encode 3D spatial information and use a Parameterized Quantum Circuit (PQC) to learn the representation of each node as the embedding. The experiments on two well-studied downstream tasks demonstrate the efficiency and capability of our model, and the potential to execute on real quantum devices. Limitation & future works. Our method is limited by the time consumption when simulating quantum circuits, while superconducting NISQ device is entering the 50+ qubit era (Gong et al., 2021) , which gives us the confidence to test our model on one of them. Meanwhile, the noise on the gates are not fatal with such shallow circuits. But we will need to adjust the readout procedure of our embedding when testing on NISQ device. It is aimed to extending our experiments to 10 thousand molecules and reaching the chemical accuracy of 1.6 × 10 -3 Hartree.



Figure 1: The quantum 3D embedding scheme. (a) The 3D molecular graph with the gray node (in the black circle) is picked as the focal atom and three white nodes within the distance threshold as the neighbors. (b) We convert the 3D Cartesian coordinates of the atoms into the relative position tuple (d, θ, ϕ). (c) We encode the position tuple as well as the atom type into two qubits for each atom. (d) The PQC for our model, the input layer includes R x and R y on each qubit, which encodes the up mentioned data. Trainable layers with parameters θs and entanglement layers are applied alternately to analog the classical machine learning layers. (e) The task of property prediction. We use the embeddings from the PQC to predict chemical properties and compare them with the labels. (f) The task of 3D molecular geometries generation. We generate a molecule from scratch based on autogressive flow model with picking one focal atom and then deciding the relative position.

Figure2: The circuit for our quantum 3D embedding ansatz. Each layer includes trainable parameters block U l and entanglement block U ent . We have N qubits in the circuit so there are 2 × N parameters in each layer. The entanglement layer is composed of CNOT gates to pairwisely entangle all the N qubits.

Figure 3: Training loss on the prediction task.

al., 2018; Peruzzo et al., 2014; O'Malley et al., 2016; Yung et al., 2014) are still simulating the energies of certain small molecules like H 2 , LiH, etc.

The entanglement layer U ent consists of CNOT gates and

Statistics of datasets.



Performance of the G-SphereNet and our proposed method on 500 randomly generated molecules for chemical validity percentage and MMD distances of bond length distributions.

