LEARNING HARMONIC MOLECULAR REPRESENTA-TIONS ON RIEMANNIAN MANIFOLD

Abstract

Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.

1. INTRODUCTION

Molecular representation learning is a fundamental step in AI-assisted drug discovery. Obtaining good molecular representations is crucial for the success of downstream applications including protein function prediction (Gligorijević et al., 2021) and molecular matching, e.g., protein-protein docking (Ganea et al., 2021) . In general, an ideal molecular representation should well integrate both geometric (e.g., 3D conformation) and chemical information (e.g., electrostatic potential). Additionally, such representation should capture features in various resolutions to accommodate different tasks, e.g., high-level holistic features for molecular property prediction, and fine-grained features for describing whether two proteins can bind together at certain interfaces. Recently, geometric deep learning (GDL) (Bronstein et al., 2017; 2021; Monti et al., 2017) has been widely used in learning molecular representations (Atz et al., 2021; Townshend et al., 2021) . GDL captures necessary information by performing neural message passing (NMP) on common structures such as 2D/3D molecular graph (Klicpera et al., 2020; Stokes et al., 2020) , 3D voxel (Liu et al., 2021) ,and point cloud (Unke et al., 2021) . Specifically, GDL encodes: a) geometric features by modeling atomic positions in the Euclidean space, and b) chemical features by feeding atomic information into the message passing networks. High-level features could then be obtained by aggregating these atom-level features, which has shown promising results empirically. However, we argue that current molecular representations via NMP in the Euclidean space is not necessarily the optimal solution, which suffers from several drawbacks. First, current GDL approaches need to employ equivariant networks (Thomas et al., 2018) to guarantee that the molecular representations transform accordingly upon rotation and translation (Fuchs et al., 2020) , which could undermine the network expressive power (Cohen et al., 2018; Li et al., 2021) . Therefore, developing a representation that could properly encode 3D molecular structure while bypassing the equivariance requirement is desirable. Second, current molecular representations in GDL are learned in a bottom-up manner, which are hardly able to provide features in different resolutions for different tasks. Specifically, NMP in Euclidean space typically achieves long-range communication between distant atoms by stacking deep layers or increasing the neighborhood radius. This would hinder the effective representation of macromolecules with tens of thousands of atoms (Battiston et al., 2020; Boguna et al., 2021) . To remedy this, residue-level graph representations are commonly used for large molecules (Jumper et al., 2021; Gligorijević et al., 2021) . Hence designing efficient multiresolution message passing mechanisms would be ideal for encoding molecules with distinct sizes. On the other hand, the molecular surface is a high-level representation of a molecule's shape, which has been widely used to study inter-molecular interactions (Richards, 1977; Shulman-Peleg et al., 2004) . Intuitively, the interaction between molecules is commonly described as a "key-lock pair", where both shape complementarity (Li et al., 2013) and chemical interactions (e.g., hydrogen bond) determine whether the key matches the lock molecule. It has been shown that the molecular surface holds key information about inter-molecular interactions (Gainza et al., 2020) , which makes it an ideal candidate for molecular representation learning (Sverrisson et al., 2021; Somnath et al., 2021) . Inspired by the idea of Shape-DNA (Reuter et al., 2006) , hereby we propose Harmonic Molecular Representation learning (HMR) by utilizing the Laplace-Beltrami eigenfunctions on the molecular surface (a 2D manifold). Our representation has the following advantages: a) HMR works on 2D Riemannian manifold instead of in the 3D Euclidean space, thus the resulting molecular representation is by design roto-translation invariant; b) HMR represents a molecule in a top-down manner, and is capable of offering multi-resolution features that accommodate various target molecules (i.e., from small molecules to large proteins), thanks to the smooth nature of the Laplace-Beltrami eigenfunctions (Fig. 1 ); c) HMR naturally integrates geometric and chemical features -the molecular shape defines the Riemannian manifold (i.e., the underlying domain equipped with a metric), and the atomic configurations determine the associated functions distributed on the manifold (e.g., electrostatics). To demonstrate that HMR is generally applicable to different downstream tasks including molecular property prediction and molecular matching, we propose two specific techniques: (1) manifold harmonic message passing for realizing holistic molecular representations, and (2) learning regional functional correspondence for molecular surface matching. Without loss of generality, we apply the proposed techniques to solve three drug discovery-related problems: QM9 small molecule property regression, ligand-binding protein pocket classification, and rigid protein docking pose prediction. Our proposed method shows comparable performance for small molecule property prediction to NMP-based models, while outperforming the state-of-the-art deep learning models in protein pocket classification and the rigid protein docking challenge.

2. RELATED WORK

Molecular Surface Representation The molecular surface representation is commonly adopted for tasks involving molecular interfaces (Duhovny et al., 2002) , where non-covalent interactions (e.g., hydrophobic interactions) play a decisive role (Sharp, 1994) . Non-Euclidean convolutional neural networks (Monti et al., 2017) and point cloud-based learning models (Sverrisson et al., 2022) have been applied to encode the molecular surface for downstream applications, e.g., protein binding site prediction (Mylonas et al., 2021) . However, existing methods apply filters with fixed sizes and are highly dependent on the surface mesh quality, which limit the expressive power for molecular shape representation across different spatial scales (Somnath et al., 2021; Isert et al., 2022) . Geometry Processing Our work was inspired by some pioneering work for shape encoding and shape matching in the geometry processing research community (Litman & Bronstein, 2013; Biasotti et al., 2016) . The surface of a 3D object is typically discretized into a polygon mesh with vertices and faces. Intrinsic properties of the surface manifold are used to encode the shape (Sun et al., 2009; Bronstein & Kokkinos, 2010) . Functional maps (Ovsjanikov et al., 2012; Litany et al., 2017b) have been proposed to establish spectral-space functional correspondence between two manifolds. Recently, deep learning has been applied to learn representative features to facilitate shape recognition and matching (Litany et al., 2017a; Donati et al., 2020; Attaiki et al., 2021) . Spectral Message Passing Our proposed method decomposes surface functions/features as the linear combination of some basis functions (hence the name "harmonic") and realizes message passing by applying various spectral filters. Graph convolutional network (GCN) is closely related to our proposed method, which operates in the graph Laplacian eigenspace (Kipf & Welling, 2016; Shen et al., 2021) . The major difference is that graph is discrete by construction, while our method works with continuous Riemannian manifold (Coifman & Lafon, 2006) . In other words, the underlying manifold and its spectrum remain the same with different surface discretizations, hence is a robust representation of the surface shape (Coifman et al., 2005) . See "Shape-DNA" in Sec. 3.

3. PRELIMINARIES

The goal of this work is to propose a representation learning method using the molecular surface, which could properly encode both geometric and chemical features. Intuitively, the molecular surface defines the shape of a molecule in 3D Euclidean space (i.e., geometry), while chemical features (e.g., hydrophobicity) can be treated as functions distributed on the molecular surface. The surface and these associated functions co-determine the underlying molecular properties, e.g., whether an antibody could bind with an antigen. Therefore, we propose to represent a molecule as a set of (learned) functions/features defined on its surface. Before moving on, we first explain some basic concepts behind our proposed representation learning framework. In Sec. 3.1, we introduce molecular "Shape-DNA" and a set of basis functions inherent to the surface manifold. In Sec. 3.2, we illustrate how to apply harmonic analysis to decompose a function (defined on the molecular surface) into the linear combination of its "Shape-DNA" basis functions, as shown in Fig. 2 . These concepts enable us to represent a molecular surface as a 2D Riemannian manifold with associated functions, which form the cornerstone of our proposed framework. = + + + ...

electrostatic potential

Figure 2 : Illustrating manifold harmonic analysis. Left-hand side: the simulated electrostatic potential function on the protein surface. Right-hand side: the linear combination of its Laplace-Beltrami eigenfunctions (ϕ 1 , . . . , ϕ 10 , . . .) with corresponding coefficients (c ϕ1 , . . . , c ϕ10 , . . .). Only a few selected terms are explicitly shown from the infinite sum. Note that these eigenfunctions exhibit different spatial frequencies (resolutions).

3.1. THE SHAPE-DNA

The molecular surfacefoot_0 can be viewed as a 2D Riemannian manifold (M), which adopts a discrete set of eigenfunctions (ϕ) that solves ∆ϕ i = λ i ϕ i , i = 0, 1, . . . Here, ∆ is the Laplace-Beltrami (LB) operator acting on surface scalar fields, defined as ∆f = -div(∇f ). ϕ 0 , ϕ 1 , . . . are a set of orthonormal eigenfunctions (i.e., ⟨ϕ i , ϕ j ⟩ M = δ ij ). And 0 = λ 0 ≤ λ 1 ≤ . . . are the corresponding eigenvalues. These set of eigenvalues {λ i } of the LB operator are called the "Shape-DNA" and their corresponding eigenfunctions {ϕ i } are unique to each shape. 2 . In other words, different molecules adopt different LB eigenfunctions. Notably, two different 3D conformations of the same molecule may not have the same LB eigenfunctions, but a pair of chiral molecules (i.e., mirror image of each other) share the same LB spectrum. Fig. 2 shows a few selected eigenfunctions (ϕ 1 , ϕ 10 , ϕ 100 ) of a protein surface manifold on the right-hand side. These eigenfunctions are intrinsic properties of the surface manifold, which remain invariant under rigid transformations to the molecule. The eigenvalues λ i reflect the surface Dirichlet energy (defined as ⟨∆f, f ⟩ M ), which measures the smoothness of eigenfunction ϕ i over M. The eigenvalues increase linearly, whose slope is roughly inversely proportional to the surface area (known as Weyl's asymptotic law). See Appendix A for more details about the "Shape-DNA".

3.2. BASICS OF MANIFOLD HARMONIC ANALYSIS

Now, we introduce basic manifold harmonic analysis, a merit of using the Riemannian manifold representation. "Harmonic analysis" refers to the representation of functions as the superposition of some basic waves. Specifically in our case, given the molecular surface manifold M and its LB eigenfunctions {ϕ i } ∞ i=0 , any scalar-valued function f that is square-integrable on M can be decomposed into a generalized Fourier series: f (x) = ∞ i=0 ⟨f, ϕ i ⟩ M ϕ i (x) . (2) In other words, f can be represented as the linear combination of the LB eigenfunctions. The linear combination coefficient ⟨f, ϕ i ⟩ M can be interpreted as the "projection" of f onto the eigenfunction ϕ i , which reflects the contribution of this particular eigenfunction to synthesizing function f . Interestingly, it is easy to notice that these eigenfunctions display different spatial frequencies (Fig. 2 ), i.e., ϕ 1 varies slowly over the surface, while ϕ 100 oscillates at a much higher frequency. This is analogous to the set of sin(kx) functions in the 1D case, where high frequency waves with larger k values exhibit more oscillations within the period of length 2π. Hereafter we refer to LB eigenfunctions with (relatively) small/large eigenvalues as the low/high frequency components. Therefore, by manipulating the linear combination coefficients, we could control the contribution of different frequency components to synthesizing function f . For instance, with only low frequency components, the synthesized function will have a lower spatial resolution on the surface (i.e., more smoothed out, Fig. 1 right panel, see Appendix B for technical details.), and vice versa. Synthesizing a new function f with some selected frequency components is commonly known as wave filtering in signal processing, which will be applied in our representation learning framework to realize multiresolution encoding of the molecular surface. We refer the readers to some excellent review papers for more details about geometry processing (Bronstein et al., 2017; Rosenberg, 1997) .

4. METHODOLOGY

We realize that the aforementioned geometry processing methods are suitable for molecular surface representation. However, a clear distinction between the 3D objects used in geometry processing research and the molecular systems is that molecules are not simply shapes -their underlying atomic structures beneath the surface govern the molecular functionality. In other words, both geometry (i.e., shape) and chemistry have to be considered for molecular encoding. It has been shown that the molecular surface displays chemical and geometric patterns which fingerprint a protein's mode of interactions with other biomolecules (Gainza et al., 2020) . Therefore, we formulate both geometric and chemical properties of a molecule as functions distributed on its surface manifold. Then comes the question: how do we properly learn these geometric and chemical features on the molecular surface? One viable solution is to emulate the message passing framework commonly used in GNNs, whose goal is to propagate information between distant surface regions to encode surface features at different scales. To that end, we present two methods under the HMR framework. In Sec. 4.1, we introduce manifold harmonic message passing, which makes use of harmonic analysis techniques to allow efficient multi-range communication between regions on the molecular surface regardless of its size. This enables HMR to encode information within a molecule and be applied to molecule-level prediction tasks. In Sec. 4.2, we propose a HMR-powered rigid protein docking pipeline. We use this docking challenge to demonstrate the potential of our proposed representation learning method for modeling interactions between large biomolecules.

4.1. LEARNING HARMONIC MOLECULAR REPRESENTATIONS

Given the 3D atomic structure of a molecule, HMR returns (1) a discretized molecular surface manifold (i.e., a triangle mesh) with N vertices {x 1 , . . . , x N } ⊂ R 3 and the corresponding faces; (2) a set of n per-vertex features, F ∈ R N ×n , which can be viewed as n learned surface functions. The discretized surface manifold as well as these features represent the geometry and chemistry of the underlying molecule, and can be used for various downstream prediction tasks.

Surface Preparation

We use MSMS (Ewing & Hermisson, 2010) to compute the molecular solvent-excluded surface as a triangle mesh with N vertices. Then, we compute the first k LB eigenfunctions {ϕ i } k-1 i≥0 with ascending eigenvalues as described in Reuter et al. (2009) , and stack them into an array Φ ∈ R N ×k , where each column stores an eigenfunction. Geometric features can be readily calculated given the surface mesh. We compute the per-vertex mean curvature, Gaussian curvature, and the Heat Kernel Signatures as described in Sun et al. (2009) . Local chemical environment is captured using a simple multilayer perceptron (MLP). For each vertex, we encode its neighboring atoms within a predefined radius (e.g., 6 Å) through MLP, then sum over the neighbors to obtain its chemical embedding. We use another MLP to combine the per-vertex initial features F inp ← MLP(concat(F geom , F chem )), F inp ∈ R N ×n . These n features reflect the local geometric and chemical environment of each surface vertex, which will be used as input to the harmonic message passing module. See more implementation details in Appendix C. The output of the surface preparation module includes (1) the molecular surface triangle mesh, (2) the surface Laplace-Beltrami eigenfunctions Φ, and (3) the per-vertex features F inp . Harmonic Message Passing Our proposed harmonic message passing mechanism is closely related to the heat diffusion process on an arbitrary surface. Joseph Fourier developed spectral analysis methods to solve the heat equation ∂f /∂t + ∆f = 0, where f is some heat distributed on the surface. This concise partial differential equation describes how a heat distribution f evolves over time, whose solution can be expressed using the heat operator exp(-∆t), i.e., f (t) = exp(-∆t)f 0 for initial heat distribution f 0 at t = 0. Intuitively, heat will flow from hot regions to cool regions on the surface. As time approaches infinity, the heat distribution f will converge to a constant value (i.e., the global average temperature on the surface), assuming that total energy is conserved. In fact, heat diffusion can be thought of as a message passing process, where surface regions with different temperatures communicate with each other and propagate the initial heat distribution deterministically. The heat exchange rate is dependent on the difference in temperature (determined by the LB operator), while the message passing distance is determined by the heat diffusion time t. Following this idea, we generalize the heat diffusion process by proposing a function propagation operator P with neural network-learned frequency filter F θ (λ): Pf = i F θ (λi)⟨f, ϕi⟩Mϕi , F θ (λ) = exp - (λ -µ) 2 σ 2 • exp(-λt) where θ = (µ, σ, t) . (4) As shown in Eq. 3, the input function f is first expanded as the linear combination of the LB eigenfunctions with coefficients ⟨f, ϕ i ⟩ M . We then learn a spectral-space frequency filter F θ (λ), which is a function of the corresponding eigenvalue λ. Here we abuse the usage of frequency, which actually refers to the LB eigenvalues. The filter F θ (λ) consists of two components (Eq. 4): a Gaussian frequency filter (with parameters µ and σ), and the heat operator part e -λt (with parameter t). The matrix representation of the function propagation operator P is plotted in Fig. 3 .

All-frequency

High-frequency Low-frequency For each input function f within F inp , the neural network learns a unique set of parameters (µ, σ, t) through backpropagation. In Fig. 4 , we showcase a few examples of surface functions obtained by applying different frequency filters to an initial delta function. The Gaussian frequency filter allows the network to propagate the input function along some selected eigenfunctions with eigenvalues close to µ, while the number of selected eigenfunctions is determined by the filter width σ. Owing to the multi-resolution nature of the LB eigenfunctions with different spatial frequencies, this filter will help capture surface functions at different resolutions. The heat operator part governs the communication distance. With longer propagation time, function f will be more averaged out towards the global mean, leading to a smoothed function. In addition, the heat operator is by definition a low-pass filter, where components with larger eigenvalues decay faster. This makes eigenfunctions with large eigenvalues contribute less significantly during message passing. Therefore, the combination of the Gaussian frequency filter and heat operator could help the network focus on some higher frequency components (Aubry et al., 2011) . As demonstrated, HMR is able to represent a variety of surface functions through harmonic message passing. The output of this module has the same size as the input, F mp ∈ R N ×n , where each channel respectively contains the propagated version of its input function. These features represent the neighboring geometric and chemical environment across multiple spatial scales, and can be used for surface property-related tasks, e.g., protein binding site prediction (see Sec. 4.2). In addition, molecule-level representations could be obtained through global pooling (see Sec. 5.1).

4.2. LEARNING SURFACE CORRESPONDENCE FOR RIGID PROTEIN DOCKING

In this section, we demonstrate how to learn surface correspondence for molecular matching. Specifically, we introduce a surface-based rigid protein docking workflow (Fig. 5 ) powered by HMR. Rigid protein docking is a significant problem in structural biology, whose goal is to predict the pose of the protein complex based on the structure of the ligand and receptor proteins. It has been shown that protein-protein interfaces exhibit similar geometric and chemical patterns (Gainza et al., 2020) . In other words, two proteins may interact if part of their surfaces display similar shapes and chemical functions (i.e., functional correspondence). This is similar to solving a puzzle problem, where both shape and pattern of the missing piece have to match in order to complete the puzzle. Following this idea, we propose to realize rigid protein docking in two consecutive steps: (1) given two protein surfaces, predict the region where binding might occur (i.e., binding site prediction, locating the missing piece); (2) establish functional correspondence between the ligand/receptor binding surfaces, and convert it to real-space vertex-to-vertex correspondence (shape/pattern matching). Rigid docking could then be achieved by aligning the corresponding binding site surface vertices. Binding Site Prediction Given the ligand and receptor protein surface meshes, we first predict the regions where they interact, which is a per-vertex binary classification problem. We iteratively apply HMR and cross-attention layers (Fig. 5 ) to encode the surfaces with intra-and inter-surface communications. Next, we use the learned features on each vertex to classify whether it belongs to the binding interface. Detailed descriptions of this module are available in Appendix D. The output of the binding site prediction module includes: a) the Receptor (Ligand) surface mesh of the binding interface, which is a submanifold of the entire protein surface, represented as M R (M L ) with N R (N L ) vertices, and b) neural network-learned features or so-called surface functions F M R and F M L distributed on the receptor and ligand binding interfaces, respectively. Rigid Docking with Functional Maps We know that protein interfaces M R and M L exhibit similar geometry and also adopt a set of n corresponding functions F M R ∈ R N R ×n and F M L ∈ R N L ×n (Fig. 5 ). Intuitively, if we could somehow align these set of corresponding functions, then we have found a way to align the protein binding interfaces. To that end, we employ functional maps to "align" F M R and F M L in spectral domain. Specifically, given the truncated LB eigenfunctions Φ M R ∈ R N R ×k and Φ M L ∈ R N L ×k of the receptor and ligand interface manifolds, we respectively compute the spectral representation of learned functions as A = (Φ M R ) + F M R , and B = (Φ M L ) + F M L , where + denotes the Moore-Penrose pseudo-inverse, and A, B ∈ R k×n . Functional correspondence (C) can be obtained by minimizing: C = arg min C∈R k×k ∥CA -B∥ F , where ∥ • ∥ F denotes the Frobenius norm. Once the functional mapping (i.e., the C matrix) is recovered through numerical optimization, vertex-to-vertex correspondence between the receptor and ligand surfaces can be established by mapping indicator functions of vertices on M R to those on M L . In practice, we adopted a slightly more complicated functional maps approach, which is illustrated in Appendix E. Finally, we perform rigid docking by aligning the proteins according to the vertex-to-vertex surface correspondence using the Kabsch algorithm (Kabsch, 1976) . We employ HMR to perform property regression tasks on the QM9 dataset (Ramakrishnan et al., 2014) . We use the same data split and atomic features as Satorras et al. (2021) . We compare our results with both invariant and equivarient networks as shown in Table 1 , including SchNet (Schütt et al., 2018) , NMP (Gilmer et al., 2017) , Cormorant (Anderson et al., 2019) , SE(3)-Transformer (Fuchs et al., 2020) , and SEGNN (Brandstetter et al., 2021) . See Appendix F for the complete table and experimental setup. Interestingly, despite HMR completely discards the bonding information and only performs massage passing over the molecular surface, it still shows comparable performance in predicting these molecular properties. As shown in Fig. 6 , the balanced accuracy of our model is consistently better than that reported in Gainza et al. (2020) , suggesting that the proposed HMR framework can more effectively encode protein surface information through harmonic message passing over the molecular surface Riemannian manifold.

G+C

In addition, we draw a similar conclusion that both geometric and chemical information of the binding pockets are important in predicting the type of its binding molecules. See Appendix G for implementation details and more result analysis.

5.3. RIGID PROTEIN DOCKING

Finally, we evaluate HMR on a more challenging task: rigid protein docking. Experimental setup HMR is trained on a modified version of Database of Interacting Protein Structures (DIPS) (Townshend et al., 2019) and evaluated on the gold-standard Docking Benchmark 5.5 (DB5.5) (Guest et al., 2021) . We compare our model with the state-of-the-art GNN-based deep learning model EQUIDOCK (Ganea et al., 2021) and two traditional docking methods, AT-TRACT (de Vries et al., 2015) and HDOCK (Yan et al., 2020) . To evaluate docking performance, we compute the Complex and Interface root-mean-square deviation (RMSD) following Ganea et al. (2021) , and calculate DockQ following Basu & Wallner (2016) . We also report the success rate indicating whether the result achieves "Acceptable" or higher according to Lensink & Wodak (2013) . Results Model performance are summarized in Table 2 and Fig. 7a . HMR (Top 1) outperforms GNN-based EQUIDOCK model under all metrics. Notably, HMR achieves a much higher success rate, indicating more test cases have results close to the ground truth complex structure. We note that traditional methods still exceed in terms of docking performance but at greater computational cost (Table H .1). Further experiments show that the harmonic message passing mechanism learns to propagate information at different scales (Fig. H .1) and is critical to the effectiveness of the model (Table H .2). Our feature ablation studies show that chemical information is particularly important in the rigid protein docking task (Table H .3). Compared to EQUIDOCK, the novel framework of HMR and optimized training dataset collaboratively contribute to the higher performance (Table H .4). We further examine the learned features at the predicted binding interfaces. As shown in Fig. 7bc , surface functions on the two binding sites are highly correlated, confirming the good alignment achieved using functional maps. Cases with higher docking quality show stronger interface feature correlations, suggesting our HMR is able to learn complex surface interaction patterns, which also supports the claims in Gainza et al. (2020) about the significant role of protein surface. HMR predicts multiple binding sites for some proteins (either due to certain protein symmetries or model uncertainty). Therefore, we also assess the model performance by including candidate poses from top 3 binding site pairs, ranked by the mean probability predicted by the binding site classifer. The best scores from top 3 poses show that the performance of HMR is competitive to ATTRACT. See Appendix H for more detailed analysis on the rigid protein docking experiment.

6. CONCLUSIONS AND OUTLOOK

We presented HMR, a powerful surface manifold-based molecular representation learning framework. By integrating geometric and chemical properties as functions distributed on the molecular surface manifold, and applying harmonic message passing in spectral domain, we achieve multiresolution molecular representations. HMR shows promising performance in molecular property prediction, protein pocket classification, and molecular matching tasks. Our work highlights an important aspect of molecular "structure-activity" relationship, that is -"shape-activity" relationship. We think this is particularly significant for large biomolecules, where the surface shape and chemical patterns determine some fundamental biological activities, such as protein-protein interactions. The HMR framework could serve as a complementary method to GNN/NMP-based models in solving challenges for complex biological systems, which exhibits its unique advantages and shortcomings. One of the foreseeable challenges in further developing the HMR framework is the efficient computation of the "Shape-DNA" (i.e., solving a second-order partial differential equation) in order to incorporate protein dynamics. Since either a change of the protein backbone or some side chains near the surface may alter the entire "Shape-DNA", which needs to be recomputed upon protein conformational change. This is particularly important for large biomolecules where the surface shape undergoes significant changes (e.g., the Complementarity-Determining Regions of an antibody upon binding with an antigen, or the allosteric site of some enzymes). To that end, we call for more research attention to surface manifold-based molecular representation learning. M f 1 (x)f 2 (x)dµ(x) , where the area element dµ is induced by the Riemannian metric. We denote by L 2 (M) = {f : M → R | ⟨f, f ⟩ M < ∞} the space of square-integrable functions on M. On a Riemannian manifold M, we can generalize the usual Euclidean gradient ∇f and the positive semidefinite Laplace operator ∆f = -div(∇f ) to the intrinsic gradient ∇ M f and the Laplace-Beltrami (LB) operator ∆ M f , respectively (Petersen, 2006) . The LB operator admits a discrete set of eigenfunctions that solves ∆ M ϕ i (x) = λ i ϕ i (x) x ∈ M values of a pair of ligand and receptor protein surfaces (PDB ID: 3V6Z). The protein with smaller surface area exhibits a larger slope, known as Weyl's asymptotic law. Fig. A.1b presents a few eigenfunctions with correspondending eigenvalues for the ligand and receptor molecules. Since different eigenfunctions have their unique spatial resolutions, in our experiments we compute all eigenfunctions with eigenvalues smaller than a predefined value (determined empirically) to guarantee that molecules of different sizes have the same highest spatial resolution in their eigenfunctions.

B IMPLEMENTING MANIFOLD HARMONIC ANALYSIS

Laplace-Beltrami Eigendecomposition Given a Riemannian manifold M and its Laplace-Beltrami operator ∆ M , the Laplacian eigenvalue problem we consider states as ∆ M f = λf, with homogeneous Neumann boundary condition. To realize discrete calculations, we approximate the manifold with a triangle mesh consisting of N vertices {x i } N i=1 ⊂ M, and the corresponding faces. We then solve the discretized eigenvalue problem using a linear finite element method (FEM) (Reuter et al., 2009) . The discretized equation we obtain is the following generalized eigenvalue problem A cot f = λBf, f := (f (x i )) N i=1 , (B.1) where A cot is the stiffness matrix with cotagent weights, A cot (i, j) :=              1 2 (cot α ij + cot β ij ) if (i, j) is an edge -k∈N (i) A cot (i, k) if i = j 0 otherwise , and B is an N × N sparse mass matrix which is associated with the weight of each surface vertex, B(i, j) :=              1 12 (|t 1 | + |t 2 |) if (i, j) is an edge 1 6 t k ∈T (i) |t k | if i = j 0 otherwise . Here α ij and β ij are the two angles opposite to the edge (i, j), and N (i) denotes the vertices that are adjacent to vertex i. The set T (i) contains all the triangles that have i as its vertex, and |t i | is the area of the triangle t i . We also use t 1 and t 2 to denote the triangles that share the edge (i, j). Such generalized symmetric eigenvalue problem can be solved with commonly used numerical simulaiton packages (e.g., scipy). We can find non-negative eigenvalues Λ and eigenvectors Z such that Z ⊤ A cot Z = Λ, Z ⊤ BZ = I. I is the identity matrix, Λ := diag(λ 0 , λ 1 , . . . ) is the diagonal matrix of the eigenvalues, and the matrix Z ∈ R N ×N has the eigenvectors of Eq. B.1 as its column vectors. Note that, unlike other conventional orthonormal basis (i.e., Q ⊤ Q = I), here Z forms an orthonormal basis w.r.t. the mass matrix B, that is, Z ⊤ BZ = I. Practically, we do not need to store the entire N × N eigenvector matrix Z. Just like in Fourier series expansion, a truncated Fourier basis with finite terms can be used to approximate the original signal. The number of basis that we keep determines the resolution of the synthesized signal, where typically high-frequency components are truncated. In our case, we empirically determine the number of eigenvectors to keep for different molecular systems, which is also task-dependent. Resolution Tuning with Harmonic Analysis We now explain how to realize surface resolution tuning under our representation framework (e.g., how to make Fig. 1 ). Given a molecular surface triangle mesh, we first compute its Laplace-Beltrami eigendecomposition as described above. Under the discrete setting, we refer to the eigenfunctions as eigenvectors. We obtain a set of truncated Laplace-Beltrami eigenvectors Z ∈ R N ×k (k is the number of eigenvectors we keep), the associated k eigenvalues {λ i } k-1 i=0 , and the sparse mass matrix B ∈ R N ×N . Let f ∈ R N be the initial function of interest (stored as an N -dimensional array), which is distributed on the underlying molecular surface (e.g., electrostatic potential, a scalar value at each surface vertex). The spectral representation of function f can be calculated as f spec = Z ⊤ Bf, f spec ∈ R k , which is similar to a discrete Fourier transform. To project the function back to real space (inverse Fourier transform), we simply do f ′ = Zf spec , f ′ ∈ R N . Resolution tuning is achieved by controlling the number of basis we use (i.e., tuning k) in spectral space. As shown in Fig. 1 , the electrostatic potential function resolution can be tuned by manipulating the number of Laplace-Beltrami eigenfunctions. However, what about the resolution of the shape itself? In Fig. 1 , we see that the surface smoothness can also be tuned. It is important to realize that manifold is an abstract concept, which does not necessarily have a particular realization in the Enclidean space. The surfaces that we visualize in Fig. 1 are realizations of the underlying manifold in the Euclidean space, where each surface vertex is associated with some Cartesian coordinates. These coordinates are extrinsic properties of the manifold, thus can be treated in the same way as other surface functions (e.g., the electrostatic potential). Therefore, in order to reconstruct the molecular surface with a lower spatial resolution, we can simply calculate the smoothed Cartesian coordinates: x ′ = Zx spec = ZZ ⊤ Bx, and do the same for y and z to obtain the smoothed surface coordinates (x ′ , y ′ , z ′ ). In short, we realize resolution tuning with a series of (sparse) matrix multiplications bringing the surface functions backand-forth between the real space and the generalized Fourier space.

C SURFACE PREPARATION

The raw input to the HMR framework is simply the 3D atomic structures (e.g., xyz files for small molecules, or PDB files for proteins). In other words, we only need to know where these atoms are and their atomic species. For proteins with only heavy atoms (since hydrogen atoms are almost invisible under X-ray diffraction detectors), we use reduce (Word et al., 1999 ) (or alternatively PDB2PQR (Dolinsky et al., 2004 )) software to add hydrogen atoms. Next, we employ MSMS (Ewing & Hermisson, 2010) to calculate the solvent-excluded surface of the molecule (with probe radius 1.5 Å, sampling density 3.0 for small molecules and 1.0 for proteins) as a triangle mesh. We use PyMesh (Zhou, 2019) to further refine the surface mesh in order to reduce the number of vertices and fix poorly meshed areas. Degenerate vertices or disconnected surfaces would lead to numerical issues for solving the generalized eigenvalue problem in the next step, thus should be fixed beforehand. We then compute the truncated Laplace-Beltrami eigenvectors, eigenvalues, and the mass matrix as described in Appendix B. Initial geometric features can be directly calculated given the surface triangle mesh, where we use the libigl (Jacobson & Panozzo, 2017) package to calculate the mean and Gaussian curvatures, and compute the Heat Kernel Signatures as described in Sun et al. (2009) . These geometric features capture shape-related properties of the molecular surface, and are stored as a scalar-type array F geom ∈ R N ×p , where p is the number of initial geometric features (a user defined variable). Chemical features are projected from atoms to their neighboring surface vertices. We first obtain an initial descriptor vector u for each atom (e.g., atomic number, charge, etc.). Then, for each surface Published as a conference paper at ICLR 2023 vertex x i , we compute its ν nearest neighbor atoms centered at {a i 1 , . . . , a i ν } with features {u i 1 , . . . , u i ν }. We apply a multilayer perceptron (MLP) to the vector [ u i ν , 1/∥x i -a i ν ∥] for each neighboring atom, then compute the average over the neighbors to obtain the chemical feature vector F chem ∈ R N ×q , where q is a user defined variable indicating the dimension of initial chemical features. In short, the initial chemical features of each surface vertex are determined by its neighboring atomic species and their distance, which is learned by a MLP. We use another MLP to combine the per-vertex initial features F inp ← MLP(concat(F geom , F chem )), F inp ∈ R N ×n . These n features reflect the local geometric and chemical environment of each surface vertex, which will be used as the input to the harmonic message passing module. The output of the surface preparation module includes (1) the truncated Laplace-Beltrami eigenvectors Z ∈ R N ×k , the corresponding eigenvalues {λ i } k-1 i=0 , and the sparse mass matrix B ∈ R N ×N , (2) per-vertex scalar features F inp ∈ R N ×n .

D THE BINDING SITE PREDICTION MODULE

First, we define the binding site as the protein surface region which is within 3 Å to its counterpart surface, and obtain the set of all corresponding surface points P as the nearest neighbor vertices between the ground truth protein interfaces. The binding site prediction module stacks two feature propagation blocks, each consists of three HMR layers (introduced in Sec. 4.1) and a cross attention layer. Given the propagated receptor features F and ligand features G, the cross attention layer enables communication between proteins: F ′ = softmax (FW Q )(GW K ) ⊤ √ d k (GW V ), G ′ = softmax (GW Q )(FW K ) ⊤ √ d k (FW V ), where F ∈ R N R ×d k , G ∈ R N L ×d k denotes the propagated features on the receptor/ligand protein surface, d k denotes the dimension of features, and W Q , W K and W V are the parameter matrices for the query, key, and value in attention computation, respectively. The loss function consists of two components. The first is a binary cross entropy loss, which encourages the model to correctly predict the binding site: L bce (i) = -[y i log x i + (1 -y i ) log(1 -x i )], where y i and x i are the label and predicted probability of whether vertex i belongs to the binding site. The second term is a PointInfoNCE loss (Xie et al., 2020) , a contrastive matching loss that minimizes the distance between the features of corresponding surface points and maximizes the distance between non-corresponding point features: L match = - (i,j)∈P log exp(f i • g j / τ ) (•,k)∈P exp(f i • g k / τ ) , where P is the set of all corresponding surface points and τ is the temperature factor (a hyperparameter). Here f i and g j are the neural network-learned feature vectors at point i and j, which belong the the receptor and ligand surface, respectively. The total loss is the weighted sum of the two loss terms: L = L bce + λL match , where empirically we set λ to 0.1 in our docking experiments. Model Architecture and Performance The HMR-based classification model contains 6 layers of HMR propagation layers followed by a global average pooling to aggregate information from all surface points of a pocket. A simple 2-layer MLP is used to classify pockets into seven ligandbinding classes. HMR is trained to minimize the cross-entropy loss for 400 epochs (approx. 20,000 iterations with a batch size of 32) and the one with the best balanced accuracy score on the validation set is selected. To compare with MaSIF-ligand, we calculated the balanced accuracy for multi-class classification (Fig. 6 ). Per class performance of our full model (geometric + chemical features) is shown as a confusion matrix (Fig. G.1). Z * ∈ R 3×(n+m) and Z ∈ R 3×(n+m) be the α-carbon coordinates of the ground truth and predicted protein complexes, respectively, were m and n are the number of α-carbons in the receptor and ligand protein. After superimposing the complex structures using the Kabsch algorithm, Complex RMSD is calculated as 1 n+m ||Z * -Z|| 2 F . Similarly, Interface RMSD is calculated using the same procedure but with the α-carbon coordinates of interface residues (< 8 Å to the other protein's residues). Smaller RMSD value means the predicted structure is closer the ground truth structure. In addition, we evaluate the overall quality of docking using DockQ (Basu & Wallner, 2016) and the success rate of achieving "Acceptable" or higher according to Méndez et al. (2003; 2005) . Both metrics are based on three standardized criteria used by Critical Assessment of PRedicted Interactions (CAPRI): L rms is the ligand (the smaller protein) RMSD calculated based on backbone atoms, after superimposing receptor's backbone atoms; I rms is the backbone RMSD of interface residues, after superimposing the interface residues (residues with any atom is < 10 Å to atoms in the other protein); and f nat is the recall in recovering residue-residue contacts between the proteins, where two residues are "in contact" if any pair of atoms from two residues has distance < 5 Å. DockQ is a continuous score between 0 and 1 (the higher the better), derived from L rms , I rms , and f nat . A docking result is considered as a "success" if it is ranked "Acceptable" or higher according to CAPRI's criteria, that is f nat ≥ 0.1 ∧ (L rms ≤ 10.0 ∨ I rms ≤ 4.0) OR f nat ≥ 0.3 Both DockQ score and CAPRI ranking is calculated using the DockQ package (https:// github.com/bjornwallner/DockQ/).

H.3 RESOURCE CONSUMPTION

Dataset We report the rigid protein docking resource consumption using the Docking Benchmark 5.5 (DB5.5) dataset (Guest et al., 2021) , which contains 253 pairs of representative protein complex structures. This is the gold-standard test set to evaluate protein docking model performance. We preprocessed the DB5.5 dataset in parallel on 64 CPUs. Adding hydrogen atoms to the PDB data took 34 seconds (using reduce), computing the solvent-excluded molecular surface with MSMS took 6 seconds, triangle mesh refinement by PyMesh took 3.4 minutes, and computing the Laplace-Beltrami eigenfunctions took 18 minutes with scipy (eigsh)foot_2 . The average number of vertices per protein (i.e., ligand or receptor) is 3,380, with 215 calculated Laplace-Beltrami eigenfunctions (maximum eigenvalue capped to 0.3, which is determined empirically). The total storage space for the processed dataset is 2 GB (about 8 MB per protein, much larger than QM9 molecules). For our training set with 11,781 protein complexes, the storage space is 101 GB. Inference The HMR inference time in predicting DB5.5 proteins complexes is shown in Table H .1, in comparison to EQUIDOCK, HDOCK, and ATTRACT. For HMR, data preprocessing took 80% of inference time, functional maps (docking pose prediction) took 19%, while binding site prediction took only 1% of time. 



For instance, an isosurface of its electron density field, or the solvent-accessible/exclusive surface, etc. Two shapes may share the same LB eigendecomposition (isometries). The uniqueness of eigendecomposition up to isometries is associated with the question "Can One Hear the Shape of a Drum?"(Kac, 1966) Solving the generalized eigenvalue problem should be done with less parallel processes to allocate more CPU resources to each solver for better efficiency.



Figure 1: Multi-resolution molecular surface representation. Showing the electrostatic potential (blue regions being negatively charged, PDB ID: 3V6F) at different resolutions under our representation. See Appendix B for technical details about tuning resolution.

Figure3: HMR workflow. Given a molecular surface mesh with N vertices, we compute the first k Laplace-Beltrami eigenfunctions (column-wise stacked into an array Φ, and Φ + denotes its Moore-Penrose pseudo-inverse, see Appendix B for discrete calculations) with ascending eigenvalues, and extract n initial surface features F inp through MLP. Then, we apply neural network-learned spectral filters to propagate the features over the surface to achieve message passing. Note that each feature channel has a unique Gaussian frequency filter and propagation time t. Relevant tensor sizes are indicated in parentheses. Multiple message passing blocks can be stacked for better representations.

Figure 4: Examples of versatile message passing outcomes using different frequency filter settings.

Figure5: The rigid protein docking pipeline. Given the surface of the ligand and receptor proteins, we apply multiple HMR and cross attention layers to allow communication within and between the surfaces. The learned surface representations are then used to predict the binding interfaces and establish functional correspondence using functional maps. Rigid docking is achieved by converting the functional correspondence to rigid transformations which aligns the predicted interfaces.

Figure 7: Rigid docking performance analysis. a. Distribution of Complex RMSD and DockQ scores for poses predicted using HMR (ours, Top 1), EQUIDOCK, HDOCK, and ATTRACT. b. Correlations between learned functions on the predicted ligand-receptor interfaces. For each test case, we calculate the correlation as the Pearson's r between the learned function values on ligand interface vertices and their nearest receptor vertices (corresponding vertices), averaged over 128 hidden channels. c. A showcase of functional correspondence between interfaces for a well-docked case.

Figure A.1: a Weyl's asymptotic law, where the protein with larger surface area exhibits a slower eigenvalue growth rate. b The 10 th and 200 th eigenfunction of the ligand and receptor surface manifold, respectively. Eigenfunctions with similar eigenvalues exhibit similar spatial frequencies regardless of the surface area, since they have similar smoothness in the sense of Dirichlet energy.

Figure H.2: HMR learning curve for the rigid protein docking task. Model with the best validation average precision (AP) score in binding site prediction is selected for testing on DB5.5.

Rigid Prediction Results on Docking Benchmark 5.5

1: Model performance on the QM9 dataset, reporting the Mean Absolute Error (MAE).Resource Consumption Computing the molecular surface and its Laplace-Beltrami eigenfunctions is computationally intensive. We performed data preprocessing in parallel on 64 CPUs, where molecular surface computation took 198 seconds, surface mesh refinement took 59 minutes, and solving eigenfunctions took 4.5 hours. With an average of 439 mesh vertices and 47 eigenfunctions, the entire QM9 dataset consumes 12.1 GB disk space (approximately 100 kB per molecule). The model for production contains 356,609 learnable parameters, most of which are linear transformation coefficients in MLPs. However, manifold harmonic message passing involves matrix multiplications bringing features back-and-forth between the real space and the generalized Fourier space, which is performed in a serial manner (instead of batch processing). This is because the number of surface vertices and Laplace-Beltrami eigenfunctions are different across different molecules. Therefore, we only perform batch operations on the MLPs, but not on message passing layers. We trained our model on NVIDIA A100 GPUs with 80 GB memory with a batch size of 32, which on average takes 240 seconds to train a single epoch (99,862 molecules), and 16 seconds for inference on the test set (13,069 molecules). For each molecule we keep its N × n feature matrix F (N vertices and n features), and the N × k eigenvectors (Z matrix) and its inverse matrix (Z ⊤ B matrix, see Appendix B), which does not consume much GPU memory.DatasetThe dataset is obtained fromGainza et al. (2020) with 1,438 non-redundant protein structures that each corresponds to a list of bound ligands with their atom coordinates. We first generate protein surface meshes as described in Appendix C. To extract ligand binding pockets, we identify pocket vertices on the surface mesh that are within 4 Å to any atom of the ligand and extract the largest connected component of pocket vertices as the binding pocket. Binding pockets that contain < 100 vertices are removed and the LB eigenfunctions for the remaining pocket surfaces are calculated. 3 protein complexes failed in protein surfaces generation and 63 protein complexes failed in binding pocket extraction due to disconnected surface or too few vertices after surface refinement.

1: Rigid protein docking inference time on DB5.5 dataset (per protein complex docking time averaged over 253 cases). less sensitive to resolution than the rigid docking module (powered by functional maps). This means the model could still infer where the binding site is with lower resolution, yet the quality of learned functional correspondence decreases, leading to worse docking power. TableH.6: HMR rigid protein docking performance at different spatial resolutions (quantified by the largest eigenvalue of surface manifold). AUC (area under the receiver operating characteristic curve) and AP (average precision) are metrics of the binding site prediction module.

7: Hyperparameter choices of HMR and the training phase settings

ACKNOWLEDGEMENTS

The authors thank Dr. Hang Li and Dr. Quanquan Gu for their insightful comments. Hao Zhou is supported by Vanke Special Fund for Public Health and Health Discipline Development, Tsinghua University (NO.20221080053), Guoqiang Research Institute General Project, Tsinghua University (No. 2021GQG1012).

REPRODUCIBILITY STATEMENT

The code and data are available at https://github.com/GeomMolDesign/HMR. QM9 raw dataset is provided at https://springernature.figshare.com/ndownloader/ files/3195389. The dataset for the ligand-binding pocket classification is provided at https://zenodo.org/record/2625420 and the split used by MaSIF is at https: //github.com/LPDI-EPFL/masif/tree/master/data/masif_ligand/lists. DIPS dataset can be downloaded from the following website https://github.com/ BioinfoMachineLearning/DIPS-Plus. EQUIDOCK model and checkpoints can be downloaded from https://github.com/octavian-ganea/equidock_public. ATTRACT can be downloaded from https://github.com/sjdv1982/attract. HDOCK is implemented using its local version HDOCKlite, which can be downloaded from http://huanglab.phys.hust.edu.cn/software/hdocklite/. DockQ can be downloaded from https://github.com/bjornwallner/DockQ/.

funding

* This work was conducted during internship at ByteDance Research.

A THE MOLECULAR SHAPE-DNA

Riemannian Manifold A manifold is a space that is locally flat but not necessarily globally flat. Formally speaking, a d-dimensional manifold M is a topological space where each point p ∈ M has a neighborhood that is homeomorphic to a d-dimensional Euclidean space (Lee, 2013) , which is equivalent to the tangent space at p and is denoted by T p M.We can further assign a positive definite inner product g : T p M × T p M → R on every tangent space, and this inner product is called a Riemannian metric. A manifold equipped with a Riemannian metric is called a Riemannian manifold. Intuitively, the Riemannian metric provides a measurement of the velocity when a particle moves on the manifold, and many other quantities can therefore be defined. For example, for any tangent vector X p ∈ T p M, the quantity |X p | := g(X p , X p ) can be interpreted as the traveling speed of a particle when passing through p. Hence, the traveling distance along a curve (i.e., the length of a curve) can be defined as the integral of |X p | along the curve. On a Riemannian manifold, quantities that can be expressed in terms of the Riemannian metric are called intrinsic (e.g., geodesic distance).When a manifold is realized in the Euclidean space, a natural Riemannian metric can be induced from the ambient Euclidean space. We always refer to this induced metric when talking about a Riemannian manifold in the following appendices.The Lalpace-Beltrami Operator We denote a real-valued scalar function on the manifold M by f . Given two functions f 1 , f 2 on the manifold, we can define the inner product ⟨f 1 , f 2 ⟩ M =

E DETAILS ON FUNCTIONAL MAPS

In this section, we present the details of functional maps used in Sec. 4.2.Let us be given two manifolds M and N . The aim of functional maps is to find a bijective mapping T : M → N to align these two manifolds. Unlike traditional methods that try to recover the pointto-point correspondence directly, functional maps lift the mapping T to a correspondence between the functional spaces on the two manifolds. Formally, let L 2 (•) be the functional space of square integrable functions on a manifold, we infer the functional correspondence T F : L 2 (M) → L 2 (N ) induced by the mapping T , and is defined byTo compute the functional correspondence, we need to utilize the Laplace-Beltrami basis on the manifolds. Actually, such functional correspondence has a concise expression in spectral domain: given the respective truncated LB eigenfunctions {ϕ M j } k1 j≥0 on M and {ϕ N i } k2 i≥0 on N , the functional correspondence T F can be (approximately) represented as a change of basis matrix:Now given a set of q corresponding functions {f 1 , . . . , f q } ⊂ L 2 (M) and {g 1 , . . . , g q } ⊂ L 2 (N ), we denote their spectral representations by coefficients A = (a ij ) k1×q , where a ij = ⟨ϕ M i , f j ⟩ M , and B = (b ij ) k2×q , where b ij = ⟨ϕ N i , g j ⟩ N . The matrix C can be obtained by solving the following quadratic minimization problem:where ∥ • ∥ F denotes the Frobenius norm, and the first term on the right hand side is the change of basis constraint. Two more regularization terms are introduced into the formula with tunable weights α, β > 0. ∥C • δ M -δ N • C∥ F enforces the isometry of the two manifolds where the matrices δ M and δ N are the spectral representation of the LB operators. The matrices Λ i M and Λ i N are called the orientation operator (Ren et al., 2018) and are defined by the functions f i and g i , respectively. The commutator with the orientation operator incorporates extrinsic properties into the formulation and enforces the orientation of functional maps (i.e., which side of the 2D surface is pointing "outward").Once the functional mapping (i.e., the C matrix) is recovered, point-to-point correspondence T between manifold M and N can be obtained by mapping indicator functions of vertices on M to the corresponding functions on N because T F δ m = δ T (m) for any vertex m ∈ M (Ovsjanikov et al., 2012) .

F DETAILS ON QM9 PROPERTY REGRESSION

Dataset For the QM9 molecular property regression task, we align our input data with EGNN (Satorras et al., 2021) (with a total of 130,831 instances). 181 molecules failed during molecular surface extraction (MSMS failed to generate reasonable surface mesh for some molecules). We follow the same data split as EGNN, leading to 130,650 instances (99,862 for training, 17,719 for validation, and 13,069 for test). Only the atomic number and atomic positions are used as the initial molecular information, no extra handcrafted features are involved. We compute the molecular surface as described in Appendix C, with an average of 439 vertices and 47 eigenfunctions per molecular surface.Model Architecture and Performance Different from other graph-based models, we do not explicitly encode inter-atomic distance or chemical bonding information. Instead, we feed the molecular surface manifold as the model input, which contains local chemical information (Appendix C). Surface geometric features (i.e., curvature and the Heat Kernel Signatures) are not included, since we find these features have no contribution to predicting molecular properties. We stack 6 layers of HMR, followed by a global average pooling layer to aggregate information from all surface points to make a final property prediction. Batch normalization is applied to all MLPs. The mean absolute prediction error of all 12 properties are shown in Table F .1, results averaged over three independent runs with different random seeds. rcsb.org/). While DIPS is designed to represent the interactions in protein complexes including homomeric proteins (i.e., protein complexes composed of identical proteins), we are particularly interested in interactions between different proteins in biological processes (e.g., antibody-antigen, enzyme-inhibitor, enzyme-substrates, etc.). To better resemble such interactions, we apply a modified pipeline based on DIPS, referred as DIPS-Het.Specifically, we include the PDB entries in DIPS as well as newly deposited entries from the RCSB PDB database that (1) are determined using diffraction-based methods or electron microscopy, (2) have < 3.5 Å resolution, (3) are not hybrid protein complexes (e.g., protein-DNA complexes are excluded), and (4) have < 5 protein chains. Different from the original DIPS pipeline, we a) further remove the PDB entries with only homomeric interfaces (i.e., interactions between identical proteins), and b) adopt a one-vs-rest strategy that uses each one of the protein chains in the complex as the "ligand protein" and the rest protein chains as the "receptor protein" to form ligand-receptor pairs.Similar to Ganea et al. (2021) , we remove the cases with any protein chain sharing the same protein sequence clusters (at 30% sequence similarity) with DB5.5. 

H.5 ANALYSIS OF DIFFERENT MESSAGE PASSING MECHANISMS

We study the effectiveness of proposed Harmonic Message Passing mechanism by comparing it with spatial graph message passing, a "Graph Message Passing" model where Harmonic Massage Passing is replaced by Graph Attention Network (Veličković et al., 2018) , and with a "No Message Passing" model where only a multilayer perceptron (MLP) is used to encode per-vertex features. Experimental results show that the Harmonic Message Passing model outperforms Graph Message Passing and No Message Passing models, suggesting its higher efficiency in propagating information on protein surface meshes. For a fair comparison, we further compare HMR with EQUIDOCK on the 25 cases in DB5.5 selected as a test set in Ganea et al. (2021) . We report the EQUIDOCK model fine-tuned on DB5.5, as originally reported in Ganea et al. (2021) . As shown in Table H .5, HMR still outperforms EQUIDOCK, despite not being fine-tuned on DB5.5. We perform the rigid docking experiment using HMR at different resolutions. Specifically, we cap the largest eigenvalue of the ligand and receptor protein surface manifolds used in the model. Since eigenfunctions with larger eigenvalues exhibit higher spatial resolutions, restricting the eigenvalues effectively constrains the model resolution. The results are shown in Table H .6. In general, we observe worse model performance with fewer eigenfunctions for harmonic message passing and functional maps. Interestingly, the metrics of the binding site prediction module (i.e., AUC and AP)

