A 2-PARAMETER PERSISTENCE LAYER FOR LEARNING

Abstract

1-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has found its application in strengthening the representation power of deep learning models like Graph Neural Networks (GNN). To enrich the representations of topological features, here we propose to study 2-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vectorization on 2-parameter persistence modules called Generalized Rank Invariant Landscape (GRIL). We show that this vector representation is 1lipschitz (stable) and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vectorization and its gradients. We also test our methods on synthetic graph datasets and benchmark graph datasets, and compare the results with previous vector representations of 1-parameter and 2-parameter persistence modules

1. INTRODUCTION

Machine learning models such as and Graph Neural Networks (GNNs) (Gori et al., 2005; Scarselli et al., 2009; Kipf & Welling, 2017; Xu et al., 2019) are well-known successful tools from the geometric deep learning community. The representation power of such models can be augmented by infusing topological information as some vector representation of persistent homology of the underlying space hidden in data. Many recent works have successfully integrated topological information with machine learning models. (Carrière et al., 2020; Kim et al., 2020; Gabrielsson et al., 2020; Hofer et al., 2020; Horn et al., 2022; Swenson et al., 2020; Bouritsas et al., 2022; Corbet et al., 2019; Carrière & Blumberg, 2020; Vipond, 2020) . In most of these works, the authors use 1-parameter persistence homology as the topological information. However, in (Corbet et al., 2019; Vipond, 2020; Carrière & Blumberg, 2020) , the authors use vector representations of 2-parameter persistence modules. In (Carrière & Blumberg, 2020) and (Corbet et al., 2019) , these representations are based on slices of 2-parameter persistence modules along lines, which are first studied and computed by (Lesnick & Wright, 2015) . In (Vipond, 2020) , the author generalizes the notion of 1-parameter persistence landscapes (Bubenik, 2015) . In this paper we propose a novel vector representation Generalized Rank Invariant Landscape (GRIL) for 2-parameter persistence modules which encodes richer information beyond fibered barcodes alone. The building blocks are based on the idea of generalized rank invariant (Kim & Mémoli, 2021; Dey et al., 2022) . The construction of GRIL is a generalization of persistence landscape (Bubenik, 2015; Vipond, 2020) . We will show that the vector representation GRIL is 1-Lipschitz and differentiable with respect to the filtration function f , which allows us to build a differentiable topological layer, PERSGRIL, in a machine learning pipeline. We demonstrate its use on synthetic datasets and standard graph datasetsfoot_0 . From the perspective of direct use of 2-parameter persistence modules into machine learning models, to the best of our knowledge, this is the first work of its kind. Persistent homology is a useful tool for characterizing the shape of data. Rooted in the theory of algebraic topology and algorithms, it has spawned the flourishing area of Topological Data Analysis(TDA). The classical persistent homology, also known as, 1-parameter persistence module, has attracted plenty of attention from both theory (Edelsbrunner & Harer, 2010; Oudot, 2015; Carlsson & Vejdemo-Johansson, 2021; Dey & Wang, 2022; Hofer et al., 2017; Li et al., 2022; Dey & Wang, 2022; Mémoli et al., 2022) and applications (Yan et al., 2021; Zhao et al., 2020; Yang et al., 2021b; a; Banerjee et al., 2020; Wu et al., 2020; Wang et al., 2020; Chen et al., 2021; Hu et al., 2021; Yan et al., 2022) . The standard pipeline of 1-parameter persistence module is as follows: Given a domain of interest X (e.g. a topological space, point cloud data, a graph, or a simplicial complex) with a scalar function f : X → R, one filters the domain X by the sublevel sets X α ≜ {x ∈ X | f (x) ≤ α} along with a continuously increasing threshold α ∈ R. The collection {X α }, which is called a filtration, forms an increasing sequence of subspaces ∅ = X -∞ ⊆ X α1 ⊆ • • • ⊆ X +∞ = X . Along with the filtration, topological features appear, persist, and disappear over some intervals. We consider phomology groups H p (-) (over a field, see (Hatcher, 2000) ) of the subspaces in this filtration, which results into a sequence of vector spaces. These vector spaces are connected by inclusion-induced linear maps forming an algebraic structure 0 = H p (X -∞ ) → H p (X α1 ) → • • • → H p (X +∞ ). (Hatcher, 2000) ). This algebraic structure, known as 1-parameter persistence module induced by f and denoted as M f , can be uniquely decomposed into a collection of atomic modules called interval modules, which completely characterizes the topological features in regard to the three behaviorsappearance, persistence, and disappearance of all p-dimensional cycles. This unique decomposition of 1-parameter persistence module is commonly summarized as a persistence diagram (Edelsbrunner et al., 2002) or barcode (Zomorodian & Carlsson, 2005) . Figure 1 (left) shows a filtration of a simplicial complex which induces a 1-parameter persistence module and its decomposition into bars. Some problems in practice may demand tracking the topological information in a filtration that is not necessarily linear. For example, in (Adcock et al., 2014) , 2-parameter persistence module is shown to be better for classifying hepatic lesions compared to 1-parameter persistence. In (Keller et al., 2018) , a virtual screening system based on 2-parameter persistence modules are shown to be effective for searching new candidate drugs. In such applications, instead of studying a sequential filtration filtered by a scalar function, one may study a grid-filtration induced by a R 2 -valued bifiltration function f : X → R 2 with R 2 equipped with partial order u ≤ v : 1 2 3 4 ∞ H 0 H 1 6 5 K 1 K 2 K 3 K 4 K 5 K 6 u 1 ≤ v 1 , u 2 ≤ v 2 ; see Figure 1 (right) for an example of 2-parameter filtration. Following a similar pipeline as the 1-parameter persistence module, one will get a collection of vector spaces {M f u } u∈R 2 indexed by vectors u = (u 1 , u 2 ) ∈ R 2 and linear maps {M f (u ≤ v) : M f u → M f v | u ≤ v ∈ R 2 } for all comparable u ≤ v. The entire structure M f , in analogy to the 1-parameter case, is called a 2-parameter persistence module induced from f . Unlike 1-parameter case, the algebraic structure of 2-parameter persistence modules is much more complicated. There is no complete discrete invariant like persistence diagrams or barcodes for 2-parameter persistence modules (Carlsson & Zomorodian, 2009) . A good non-complete invariant for 2-parameter persistence modules should characterize as many non-isomorphic topological features as possible. At the same time it should be stable with respect to small perturbations of filtration functions, which guarantees its important properties of continuity and differentiability for machine learning models. Therefore, how to build a good summary for 2-parameter persistence modules which is also applicable to machine learning models is an important problem.

2. 2-PARAMETER PERSISTENCE LANDSCAPE

From the perspective of representation learning, a persistence module can be viewed as a special representation of a discrete topological space, like point cloud data or graph embedding, which captures geometric and topological information. 1-parameter persistence module captures information about topological features that persist across different scales. Here, we consider a bi-filtration which leads to a 2-parameter persistence module. To better utilize the richer information captured by 2-parameter persistence modules, here we propose GRIL (Generalized Rank Invariant Landscape), a stable and differentiable vectorized representation of a 2-parameter persistence module. Let M = M f be a 2-parameter persistence module induced by a filtration function f . We say a connected subset I ⊆ R 2 is an interval if ∀u ≤ v ≤ w, [u ∈ I, w ∈ I] =⇒ [v ∈ I]. The restriction of M to an interval I, denoted as M | I , is the collection of vector spaces {M u | u ∈ I} along with linear maps {M (u ≤ v) | u, v ∈ I)}. One can define generalized rank: rk M (I) ≜ rank[lim ← --M | I → lim --→ M | I ] where lim ← --M | I → lim --→ M | I is the unique linear map from the limit of M | I to the colimit of M | I . When I = [u, v] ≜ {w ∈ R 2 | u ≤ w ≤ v} is a rectangle subset in R 2 , lim ← --M | I = M u and lim --→ M | I = M v . Then rk M (I) equals the traditional rank of the linear map M (u ≤ v). We refer the reader to (MacLane, 1971) for the definitions of limit and colimit in category theory. The basic idea of GRIL is to compute a collection of generalized ranks {rk M (I)} I∈W over some covering set W on R 2 , which is called a generalized rank invariant (Kim & Mémoli, 2021) of M over W. We choose W to be a set of Worms defined as follows: W ≜ p ℓ δ | δ > 0, ℓ ≥ 1, p ∈ R 2 where p ℓ δ ≜ {q | ∃α ∈ R, |α| ≤ (ℓ -1)δ : ∥q -p -(α, -α)∥ ∞ ≤ δ}. We call the p in p ℓ δ the center point of the ℓ-worm and δ the width of the ℓ-worm. As a special case, when ℓ = 1, p 1 δ = p δ ≜ {q : ∥p -q∥ ∞ ≤ δ} is a δ-square with side 2δ centered at p. In general, for any ℓ ≥ 1, δ > 0, ℓ-worm p ℓ δ is the union of all δ-squares q δ centered at some point q on the off-diagonal line segment p + α(1, -1) with |α| ≤ (ℓ -1)δ. Therefore, we can also equivalently write p Definition 2.1 (Generalized Rank Invariant Landscape ). For a persistence module M , the Generalized Rank Invariant Landscape (GRIL) of M is a function λ M : R 2 × N + × N + → R defined as λ M (p, k, ℓ) ≜ sup δ≥0 {rk M ( p ℓ δ ) ≥ k}. Proposition 2.1. GRIL is equivalent to the generalized rank invariant on W. Here the equivalence means bijective reconstruction from each other (proof in Appendix B). In practice, we choose center points p from some finite subset P ⊂ R 2 , e.g. a finite uniform grid in R 2 , and consider k ≤ K, ℓ ≤ L for some K, L ∈ N + . Then λ M can be viewed as a vector of dimension |P| × K × L. See Figure 3 for an illustration of the overall pipeline of our construction of λ M starting from a filtration function on a simplicial complex. Figure 4 shows the discriminating power of GRIL where we see that GRIL can differentiate between shapes that are topologically non-isomorphic. The construction starts from a simplicial complex with a bi-filtration function as shown on the top right. The simplicial complex consists of two vertices connected by one edge. Based on the bi-filtration, a simplicial bi-filtration can be defined as shown on the top left. On the bottom left, a 2-parameter persistence module is induced from the above simplicial filtration. If we check the dimensions of the vector spaces on all points of the plane, there are 1-dimensional vector spaces on red, blue and light purple regions. On the L-shaped dark purple region, the vector spaces have dimension 2. Finally, on this 2-parameter persistence module, we calculate λ M f (p, k, ℓ) for all tuples (p, k, ℓ) ∈ P × K × L to get our GRIL vector representation. By Defintion 2.1 the value λ M f (p, k, ℓ) corresponds to the width of the supremum ℓ-worm on which the generalized rank is at least k. On the bottom right, the interval in red is the maximal 2-worm for λ M f (p, k = 1, ℓ = 2). The green interval is the maximal 2-worm for λ M f (q, k = 2, ℓ = 2). The yellow square is the maximal 1-worm for λ M f (r, k = 1, ℓ = 1), and the blue interval is the maximal 3-worm for λ M f (r, k = 1, ℓ = 3). Stability and Differentiability of GRIL. An important property of GRIL is its stability property which makes it immune to small perturbations of the input bi-filtration while still retaining the ability to characterize topologies. We show GRIL is 1-Lipschitz (stable) with respect to input filtrations. Proposition 2.2. Given two filtration functions f, f ′ : X → R 2 , ∥λ M f -λ M f ′ ∥ ∞ ≤ ||f -f ′ || ∞ (proof in Appendix B). Remark 2.1. Note that when X is a finite space (e.g. finite simplicial complex (see Definition A.1), point cloud) with |X | = n then, any f : X → Rfoot_1 can be represented as a vector in R 2n . We now define PERSGRIL. Definition 2.2 (PERSGRIL). For a finite space X with |X | = n and fixed k, ℓ, p, PERSGRIL is a function Λ k,ℓ p : R 2n → R given by Λ k,ℓ p (f ) = λ M f (k, ℓ, p). Proposition 2.3. PERSGRIL is Lipschitz continuous with respect to the bi-filtration functions on finite spaces. (proof in Appendix B.) Figure 4 : GRIL as a topological discriminator: each row shows a point cloud P , its density-Rips bi-filtration 2 , GRIL value heatmap for 1-dimensional homology and generalized ranks k = 1 and k = 2 named as λ 1 and λ 2 respectively. First Betti number (β 1 ) of a circle is 1 which is reflected in λ 1 being non-zero. β 1 for two circles is 2 which is reflected in both λ 1 and λ 2 being non-zero. Similarly, β 1 of a circle and disk together is 1 which is reflected in λ 1 being non-zero but λ 2 being zero for this point cloud. By Rademacher's theorem (Evans & Gariepy, 2015) , we have PERSGRIL, as a Lipschitz continuous function, being differentiable almost everywhere. Corollary 2.4. PERSGRIL is differentiable almost everywhere. The differentiability of PERSGRIL in Corollary 2.4 refers to the existence of all directional derivatives. But the existence of a steepest direction as the "gradient" of PERSGRIL might not be unique. We propose an algorithm to efficiently compute one specific steepest direction based on the following theorem. Theorem 2.5. Consider the space of all filtration functions {f : X → R 2 } on a finite space X with |X | = n, which is equivalent to R 2n . For fixed k, ℓ, p, there exists a measure-zero subset Z ⊆ R 2n such that for any f ∈ R 2n \ Z satisfying the following generic condition: ∀x ̸ = y ∈ X , f (x) 1 ̸ = f (y) 1 , f (x) 2 ̸ = f (y) 2 , there exists an assignment s : X → {±1, 0, ±ℓ} 2 such that ∇ s Λ k,ℓ p (f ) ≜ lim α→0 Λ k,ℓ p (f +αs)-Λ k,ℓ p (f ) α∥s∥∞ = max g∈X ∇ g Λ k,ℓ p (f ). The proof of Theorem 2.5 in Appendix B also shows how to find the assignment s with the corresponding set of supporting simplices. This result leads us to update the simplicial filtration with such an assignment s. See the description of enhancing topological features in section 4.

3. ALGORITHM

We present an algorithm to compute GRIL in this section. High-level idea of the algorithm is as follows: Given a bi-filtration function f : X → R 2 , for each (p, k, ℓ) ∈ P × K × L, we need to compute λ M f (p, k, ℓ) = sup δ≥0 {rk M f ( p ℓ δ ) ≥ k}. In essence, we need compute the maximum width over worms on which the generalized rank is at least k. In order to find the value of this width, we use binary search. We compute generalized rank rk M f p ℓ δ by applying the algorithm proposed in (Dey et al., 2022) , which uses zigzag persistence on a boundary path. This zigzag persistence is computed efficiently by a recent algorithm proposed in (Dey & Hou, 2022)foot_2 . We denote the sub-routine to compute generalized rank over a worm by COMPUTERANK in algorithm 1 mentioned below. COMPUTERANK(f, I) takes as input a bi-filtration function f and an interval I, and outputs generalized rank over that interval. In order to use the algorithm proposed in (Dey et al., 2022) , the worms need to have their boundaries aligned with a grid structure defined on the range of f . Thus, we normalize f to be in the range [0, 1] × [0, 1], define a grid structure on [0, 1] × [0, 1] and discretize the worms  . Let Grid = { m M , n M | m, n ∈ {0, 1, . . . , M }} for some M ∈ Z + . We denote the grid resolution as ρ ≜ 1/M . We uniformly sample center points for the worms P ⊆ Grid from this grid. We consider discrete worms p ℓ δ ≜ q=p+(α,-α) |α|≤(l-1)δ q∈Grid q δ for all p ∈ P. See Figure 2 as an illustration of discrete worms. Now all the discrete worms p are intervals whose boundaries are aligned with the Grid. We apply the procedure COMPUTERANK f, p ℓ δ to compute rk M f p ℓ δ . Let λM f (p, k, ℓ) = sup δ≥0 {rk M f ( p ℓ δ ) ≥ k}. One can observe that λM f (p, k, ℓ) -λ M f (p, k, ℓ) ≤ ρ. Therefore, we compute λ as an approximation of λ with the approximation gap controlled by the grid resolution ρ. The pseudo-code is given in Algorithm 1. The algorithm is described in detail in Appendix C Algorithm 1 COMPUTEGRIL Input: f : Bi-filtration function, ℓ ≥ 0, k ≥ 1, p ∈ P ⊆ Grid, ρ: grid resolution Output: λ(p, k, l) : Persistence landscape at a point p for fixed k and ℓ 1: d min ← ρ, d max ← 1 2: while d min ≤ d max do 3: d ← (d min + d max )/2 4: I ← p ℓ d 5: r ← COMPUTERANK(f, I ) 6: IF r = k THEN 7: rk ← d 8: d min ← d + ρ 9: ELSE IF r > k THEN 10: d min ← d + ρ 11: ELSE IF r < k THEN 12: d max ← d -ρ RETURN rk Time complexity. Assuming a grid with t nodes and a bi-filtration of a complex with n simplices on it, one can observe that each probe in the binary search takes O((t + n) ω ) time where ω < 2.37286 is the matrix multiplication exponent (Alman & Williams, 2021) . This is because each probe generates a zigzag filtration of length O(t) with O(n) simplices. Therefore, the binary search takes O((t + n) ω log t) time giving a total time complexity of O(t((t + n) ω log t)) that accounts for O(t) worms.

4. EXPERIMENTS

We create a differentiable topological layer based on GRIL named PERSGRIL which is in line with Definition 2.2. In essence, PERSGRIL takes in a bi-filtration function as input and gives the value of GRIL on the persistence module generated by the filtration function as output. Experiment with HourGlass dataset. We test our model on a synthetic dataset (HourGlass) that entails a binary graph classification problem over a collection of attributed undirected graphs. Note that this synthetic dataset is designed to show that some attributed graphs can be easily classfied by 2-parameter persistence modules while not easy for 1-parameter persistence moduels or commonly used GNN models. Each graph G from either class is composed with two circulant subgraphs G 1 , G 2 connected by some cross edges. The node attributes are order indices generated by two different traversals T 1 , T 2 . The label of classes corresponds to these two different traversals T 1 , T 2 . Therefore, the classification task is that given an attributed graph G, the model needs to predict which traversal is used to generate G. See Figure 5 (left) as an example of two attributed graphs with same graph structure but node attributes generated by two different traversals. More details can be found in Appendix D.1. We denote HourGlass [a,b] as the dataset of graphs generated with node size of each circulant subgraphs in range [a, b] . We generate three datasets with different sizes: HourGlass [10, 20] , HourGlass [21, 30] , HourGlass [31, 40] . Each dataset contains roughly 400 graphs. We evenly split HourGlass [21, 30] into balanced training set and testing set on which we compare PERSGRIL with several commonly used GNN models from the literature including: Graph Convolutional Networks (GCN) (Kipf & Welling, 2017) , Graph Isomorphism Networks (GIN) (Xu et al., 2019) and a 1-parameter persistent homology vector representation called persistence image (PersImg (Adams et al., 2017) . All GNN models contain 3 aggregation layers. All models use 3layer multilayer perceptron (MLP) as classifiers. More details about model and training settings can be found in Appendix D.1. After that we also test these trained models on HourGlass [10, 20] and HourGlass [31, 40] to check if they can generalize well on smaller and larger graphs. The experiment results are shown in Table 1 . We can see that this dataset can be easily classified by our model based on 2-parameter persistence modules with good generalization performance. But it is not easy for 1-parameter persistence method like PersImg or some GNN models. Graph experiments. We perform a series of experiments on graph classification to test the proposed model. We use standard datasets such as PROTEINS, DHFR, COX2 and MUTAG (Morris et al., 2020) . A quantitative summary of these datasets is given in Appendix D.2. On these datasets, we compare the performance of GRIL with other models such as multiparameter persistence landscapes (MP-L) (Vipond, 2020) , multiparameter persistence images (MP-I) (Carrière & Blumberg, 2020) , multiparameter persistence kernel (MP-K) (Corbet et al., 2019) and PersLay (Carrière et al., 2020) . In (Carrière & Blumberg, 2020) , the authors use the heat kernel signature (HKS) and Ricci curvature on the graphs to form a bi-filtration. We also use the same bi-filtration and report the result in the column Gril HKS-RC. Since the graphs in all of these datasets have node attributes, we also form Density-Alpha bi-filtration on the node features and compare the performance of GRIL on this bifiltration (Gril D-Alpha) with other methods. Density-Alpha bi-filtration uses Distance-to-Measure function as the filtration function for one coordinate and an Alpha complex filtration in the other coordinate. We use a simple 1-layer MLP as a classifier in order to test the discriminating power of GRIL features. We can see from Table 2 that GRIL with 1-layer MLP gives better performance as compared to multiparameter persistence image, multiparameter persistence kernel, multiparameter persistence landscapes on PROTEINS, COX2 and MUTAG. However it doesn't seem to perform as well on DHFR. On PROTEINS, COX2 and MUTAG, GRIL has comparable performance with PersLay. (Carrière et al., 2020) . Perslay also uses a 1-layer MLP as the classifier. However, PersLay uses spectral features of the graph along with the 1-parameter persistence diagrams corresponding to the filtration given by the heat kernel signature as the filtration function. The reported results are after ten-fold cross validation. (Carrière & Blumberg, 2020) and those in the PersLay column are as reported in (Carrière et al., 2020) . We have compared the performance of this model with different values of k in λ(p, k, ℓ). The results are reported in Table 5 in Appendix D. We report the computation times for these datasets in Table 6 in Appendix D. In Table 4 , we show the performance of GRIL with different grid resolutions. Differentiability of PERSGRIL: A proof of concept. In the previous experiments we showed how PERSGRIL can be used to obtain topological signatures from graphs to facilitate a specific downstream task, which in our case is a graph classification problem. In that application, we build PERSGRIL on a static filtration function. By static, we mean that we computed the topological features and used them as an input to a classifier. In this experiment, we demonstrate, as a proof of concept, how PERSGRIL can be easily integrated in a differentiable framework (with the theoretical foundation laid in Sec.2, specifically Theorem 2.5) like the standard neural network architectures. We show this by rearranging the positions of input points, i.e. encouraging formation of clusters, holes by choosing suitable loss functions. As shown in Figure 6 , input to PERSGRIL is points sampled non-uniformly from two circles. Recall that GRIL is defined over a 2-parameter persistence module induced by some filtration function f = (f x , f y ). For every vertex v, we assign f x (v) = 1 -exp( 1 α α i=1 d(v, v i )) , where v i denotes i-th nearest neighbor of the vertex v and d(v, v i ) denotes the distance between v and v i . For our experiments we fix α = 5. We set f y (v) = 0. We compute ALPHACOMPLEX filtration (Edelsbrunner & Harer, 2010) of the points and for each edge e := (u, v) we assign f x (e) = max(f x (u), f x (v)) and f y (e) = 1 -exp(d(u, v)). To obtain a valid bi-filtration function on the simplicial complex we extend the bi-filtration function from 1-simplices to 2-simplices, i.e. triangles. We pass f as an input to PERSGRIL, coded with the framework PYTORCH (Paszke et al., 2019) , that computes persistence landscapes. PERSGRIL uniformly samples n center points from the grid [0, 1] 2 . Since GRIL value computation can be done independently for each k and a center point, we take advantage of parallel computation and implement the code in a parallel manner. In the forward pass we get GRIL values λ(p, k, ℓ) for generalized rank k = 1, 2, worm size ℓ = 2 and homology of dimension 1 while varying p over all the sampled center points. After we get the GRIL values, we compute the assignment s according to Theorem 2.5. During the backward pass, we utilize this assignment to compute the derivative of PERSGRIL with respect to the filtration function and consequently update it. We get n values of λ(•, 1, 2) for n center points. We treat these n values as a vector and denote is as λ 1 . Similarly, we use λ 2 to denote the vector formed by values λ(•, 2, 2). We minimize the loss The figures show the rearrangement of points according to the loss function, which in our case is increasing the norm of λ 1 and λ 2 vectors. We start with two circles containing some noisy points inside. We observe that the points rearrange to form two circles because that increases the norm of λ 1 and λ 2 vectors. L = -(∥λ 1 ∥ 2 2 + ∥λ 2 ∥ 2 2 ). for computing zigzag persistence. We propose PERSGRIL, a differentiable topological layer, which can be used as a topological feature extractor in a differentiable manner. As a topological feature extractor, PERSGRIL can perform better than Graph Convolutional Networks (GCNs) and Graph Isomorphism Networks (GINs) on some synthetic datasets. It performs better than the existing multiparameter persistence methods on some graph benchmark datasets. Further, we give a proof of concept for the differentiability of PERSGRIL by rearranging the point cloud to enhance its topological features. We believe that the additional topological information that a 2-parameter persistence module encodes, as compared to a 1-parameter persistence module, can be leveraged to learn better representations. Further directions of research include using PERSGRIL with GNNs for filtration learning to learn more powerful representations. We hope that this work motivates further research into exploring this direction.

A BACKGROUND AND DEFINITIONS

Here, we give the detailed definitions of all the concepts explained in the paper. We begin by defining a simplicial complex. Definition A.1 (Simplicial Complex). An abstract simplicial complex is a pair (V, Σ) where V is a finite set and Σ is a collection of non-empty subsets of V such that if σ ∈ Σ and if τ ⊆ σ then τ ∈ Σ. A topological space |(V, Σ)| can be associated with the simplicial complex which can be defined using a bijection t : V → {1, 2, . . . , |V |} as the subspace of R |V | formed by the union σ∈Σ h(σ), where h(σ) denotes the convex hull of the set {e t(s) } s∈σ , where e i denotes the standard basis vector in R |V | . We shall now define a zigzag filtration and the zigzag persistence module induced by it. Definition A.2. A zigzag filtration is a sequence of simplicial complexes where both insertions and deletions of simplices are allowed, the possibility of which we indicate with double arrows: X 0 ↔ X 1 ↔ • • • ↔ X n = X . Applying homology functor on such a filtration we get a zigzag persistence module that is a sequence of vector spaces connected either by forward or backward linear maps: H * (X 0 ) ↔ H * (X 1 ) ↔ • • • ↔ H * (X n ). Now, we give the definition of 2-parameter filtration over R 2 and the 2-parameter persistence module induced by it. Definition A.3 (2-parameter simplicial filtration over R 2 ). A 2-parameter simplicial filtration, also called bi-filtration, over R 2 is a collection of simplicial complexes {X u } u∈R 2 with inclusion maps X u - → X v for u ≤ v, that is, u 1 ≤ u 2 and v 1 ≤ v 2 where u = (u 1 , u 2 ) and v = (v 1 , v 2 ). Definition A.4 (2-parameter Persistence Module). Given a bi-filtration, {X u } u∈R 2 , by considering the homology of the simplicial complexes in the bi-filtration over the finite field Z 2 , we get a collection of vector spaces {M u | u ∈ R 2 } along with a collection of linear maps {M u→v : M u → M v | u ≤ v} . Each inclusion map in the bi-filtration induces a linear map between the corresponding homology vector spaces. Having defined 2-parameter filtration and 2-parameter persistence module, we now define the notion of an Interval in R 2 . In the definition, we shall make use of the standard partial order on R 2 , i.e., u ≤ v if u 1 ≤ v 1 and u 2 ≤ v 2 for u = (u 1 , u 2 ) and v = (v 1 , v 2 ). Definition A.5. An interval in R 2 is a subset ∅ ̸ = I ⊆ R 2 that satisfies the following: 1. If p, q ∈ I and p ≤ r ≤ q, then r ∈ I; 2. If p, q ∈ I, then there exists a finite sequence (p = p 0 , p 1 , , ..., p m = q) ∈ I so that every consecutive points p i , p i+1 are comparable in the partial order for i ∈ {0, . . . , m -1}. We now give the formal definition of generalized rank invariant over intervals in R 2 . However, generalized rank invariant can be defined over any locally finite connected poset. Definition A.6 (Generalized Rank (Kim & Mémoli, 2021) ). Given a 2-parameter persistence module M and an intervals I ⊆ R 2 , the generalized rank of M restricted to I, rk M (I), is defined as rk M (I) ≜ rank[lim ← --M | I → lim --→ M | I ].

Here lim

← --M | I , lim --→ M | I denote the limit and colimit of the functor M when restricted to I. We refer the reader to (MacLane, 1971) for the definitions of limit and colimit in category theory.

For a collection of intervals I, the collection rk

M I ≜ {rk M (I) | I ∈ I} is called generalized rank invariant of M over I. We can define a metric on the space of persistence modules based on their generalized rank invariants over all intervals in R 2 . Definition A.7 (Erosion Distance (Patel, 2018; Kim & Mémoli, 2021) ). Let Int(R 2 ) be the collection of all intervals in R 2 . Let M and N be two persistence modules. The erosion distance is defined as d E (M, N ) ≜ inf ε≥0 {∀I ∈ Int(R 2 ), rk M (I) ≥ rk N (I +ε ) and rk N (I) ≥ rk M (I +ε )}. Here I +ε denotes the ε-extension of the interval I.

B STABILITY AND DIFFERENTIABILITY: PROOFS

In this section, we provide the proof for stability and differentiability of GRIL. We begin by defining some metrics on the space of persistence modules based on GRIL. Definition B.1. Given two persistence modules M and N , a morphism f :  M → N is a collection of linear maps {f u : M u → N u } u∈R 2 such that f u • N u→v = M u→v • f v , ∀u ≤ v. Definition B. d L (M, N ) ≜ ||λ M -λ N || ∞ . We shall now look at a property of GRIL that will help in proving the stability. Definition B.6. Given any interval I and ε ≥ 0, let I +ε be the ε-extension of I defined as: I +ε ≜ p∈I p ε (2) where p ε ≜ {q : ||p -q|| ∞ ≤ ε} is the ∞-norm ε-neighbourhood of x. Proposition B.1. p ℓ δ +ε ⊆ p ℓ δ+ε . In order to better analyze the stability property of persistence landscape, we define a distance in a similar flavour as erosion distance for the underlying collection of all worms. Notation B.7. Denote the collection of all worms as W ≜ p ℓ δ | δ > 0, l ∈ N + , p ∈ R 2 . Definition B.8. For W ≜ p ℓ δ | δ > 0, l ∈ N + , p ∈ R 2 , define a distance d W E as follows: d W E (M, N ) ≜ inf ε | ∀ p ℓ δ ∈ W, [rk M p ℓ δ ≥ rk N p ℓ ε+δ and rk N p l δ ≥ rk M p ℓ ε+δ ] . (3) Proposition B.2. d L = d W E ≤ d E , where d E is the erosion distance. Proof. d W E ≤ d E is obvious by definition. To show d L ≤ d W E . Given two persistence modules M, N , assume d I E (M, N ) = ϵ. For fixed p, k, ℓ, let λ M (p, k, ℓ) = δ 1 and λ N (p, k, ℓ) = δ 2 . Without loss of generality, assume δ 2 ≥ δ 1 . We want to show that δ 2 -δ 1 ≤ ϵ. By the construction of d W E , we know that for any α > 0, k > rk N ( p ℓ δ1+α (x)) ≥ rk M ( p ℓ δ1+ϵ+α (x)). One can get δ 1 + ϵ + α > δ 2 =⇒ ϵ + α > δ 2 -δ 1 . By taking α → 0, we have δ 2 -δ 1 ≤ ϵ. ), then λ N (p, k, ℓ) ≥ ϵ + δ. By the assumption d L (M, N ) = δ, we know that λ N (p, k, ℓ) ≥ ϵ, which implies rk M ( p ℓ ϵ ) ≥ k = rk N ( p ℓ ϵ+δ ). Proposition. 2.1 GRIL is equivalent to the generalized rank invariant on W. Here equivalence means bijective reconstruction from each other. Proof. Constructing GRIL from generalized rank invariant on W is immediate from the definition of GRIL. On the other direction, for any p, δ, ℓ, the generalized rank rk M W ( p ℓ δ ) can be reconstructed by GRIL as follows: rk M W ( p ℓ δ ) = arg max k {λ(p, k, ℓ) ≥ δ} It is not hard to check that, this construction, combined with the construction of persistence landscape, gives a bijective mapping between (generalized) rank invariants over W and GRILs. By the stability property of erosion distances, we can immediately get the stability of GRIL as follows: Proposition. 2.2 For two filtration functions f, f ′ : X → R 2 , ||λ M f -λ M f ′ || ∞ ≤ ||f -f ′ || ∞ . Proof. Let M f and M f ′ be the persistence modules derived by f and f ′ respectively. Then, we have the following chain of inequalities: ∥λ M f -λ M f ′ ∥ ∞ = d L (M f , M f ′ ) ≤ d E (M f , M f ′ ) ≤ d I (M f , M f ′ ) ≤ ∥f -f ′ ∥ ∞ where d I (M f , M f ′ ) is the interleaving distance. The second last inequality has been shown in (Kim & Mémoli, 2021) . Recall that when X is a finite space (e.g. finite simplicial complex, point cloud) then, any f : X → R 2 can be considered as an n × 2 matrix which can be linearized into a vector in R 2n . Let us denote that vector by v f . Proposition (2.3). PERSGRIL is Lipschitz continuous with respect to bi-filtration functions on finite spaces. Proof. Given filtration functions f, f ′ and their corresponding vector representations v f , v f ′ ∈ R 2n , it is easy to see that ∥f -f ′ ∥ ∞ ≤ 2∥v f -v f ′ ∥ ∞ ≤ 2∥v f -v f ′ ∥. Combining this with the chain of inequalities in the previous proposition, we get that PERSGRIL is Lipschitz continuous with respect to the underlying filtration functions. Theorem (2.5). Consider the space of all filtration functions {f : X → R 2 } on a finite space X with |X | = n, which is equivalent to R 2n . For fixed k, ℓ, p, there exists a measure-zero subset Z ⊆ R 2n such that for any f ∈ R 2n \ Z satisfying the following generic condition: ∀x ̸ = y ∈ X , f (x) 1 ̸ = f (y) 1 , f (x) 2 ̸ = f (y) 2 , there exists an assignment s : X → {±1, 0, ±ℓ} 2 such that ∇ s Λ k,ℓ p (f ) ≜ lim α→0 Λ k,ℓ p (f + αs) -Λ k,ℓ p (f ) α∥s∥ ∞ = max g∈X ∇ g Λ k,ℓ p (f ). Proof. By Corollary 2.4 we know there exists some measure-zero set for some small enough ϵ. Based on the definition of λ M , we know that rk M (I -) ≥ k and rk M (I + ) < k, which means that zigzag filtrations change on some simplices while moving from ∂(I -) to ∂(I + ). Either the collection of simplices changes or the order of simplices changes. The former case corresponds to the simplices with x or y-coordinate aligned with some vertical or horizontal edges on the ∂(I). The latter case corresponds to those pairs of simplices (σ, τ ) such that R ⊂ R 2n such that PERSGRIL is differentiable in R ≜ R 2n \ R. Let M = M f be a 2- f (σ) ∨ f (τ ) ≜ (max(f (σ) 1 , f (τ ) 1 ), max(f (σ) 2 , f (τ ) 2 ) is on some off-diagonal edges on ∂(I). By the generic condition of the filtration function f , we can locate those simplices as the set S, which we call support simplices. The assignment function s is defined on each σ ∈ S by assigning s(σ) = ±1 or ±ℓ which is consistent with the moving direction of the edge from ∂(I) to ∂(I + ). We discuss the assignment values case by case: We can divide the boundary into four edges: bottom (off-diagonal) edge e b , top (horizontal) edge e t , left (vertical) edge e l , right (off-diagonal) edge e r . 1. s(σ) = (0, +ℓ) if σ has y-coordinate the same as e t , 2. s(σ) = (-ℓ, 0) if σ has x-coordinate the same as e l , 3. s(σ) = (0, -1), s(τ ) = (-1, 0) if f (σ) ∨ f (τ ) is on e b and f (σ) 1 ≤ f (τ ) 1 , 4. s(σ) = (0, +1), s(τ ) = (+1, 0) if f (σ) ∨ f (τ ) is on e r and f (σ) 1 ≤ f (τ ) 1 , See Figure 7 as an illustration. We assume f satisfies the condition that the supporting simplices in S either all belong to cases 1 and 2 or all belong to cases 3 and 4, but not a combination of them. It is not hard to see that the collection of f for which this condition does not hold is a measure zero set in R 2n . Let us denote the collection of all such f 's by F . Then, Z = F ∪ R is a measure zero set in R 2n which consists of f 's which do not satisfy the condition and those points where PERSGRIL is not differentiable. Now, check for such a generic f / ∈ Z so that the directional derivative ∇ s λ(f ) is indeed a maximal directional derivative. For the cases 3 and 4, the stability property in Proposition 2.2 implies that, for any α > 0 and any direction vector g ∈ R 2n with ∥g∥ ∞ = 1, we have λ(f + αg) -λ(f ) ≤ α. Also it is not hard to check that λ(f + αs) -λ(f ) = α for α > 0 small enough since the zigzag persistence of M f +αs | J with J = p  ∞ = 1, λ(f + αg) - λ(f ) ≤ λ(f + αs) -λ(f ) =⇒ ∇ g Λ(f ) ≤ ∇ s Λ(f ). For the case 1 (the case 2 is similar), the support simplex is on edge e t . Now for any direction vector g ∈ R 2n and α > 0 small enough, let ∆d = Λ(f + αg) -Λ(f ) and let ∆y et be the difference between y-coordinates of e t 's from p In summary, ∇ s λ(f ) indeed maximizes the directional derivative for f .

C ALGORITHM

Here, we describe the algorithm in detail. In practice, we are usually presented with a piecewise linear (PL) approximation f of a R 2 -valued function f on a discretized domain such as a finite simplicial complex. The PL-approximation f itself is R 2 -valued. Discretizing the parameter space R 2 by a grid, we consider a lower star bi-filtration of the simplicial complex. Analogous to the 1-parameter case, a lower star bi-filtration is obtained by assigning every simplex the maximum of the values over all of its vertices in each of the two co-ordinates. With appropriate scaling, these (finite) values can be mapped to a subset of points in a uniform finite grid over [0, 1]×[0, 1]. Observe that because of the maximization of values over all vertices, we have the property that two simplices σ ⊆ τ have values f (σ) ∈ R 2 and f (τ ) ∈ R 2 where f (σ) ≤ f (τ ). A partial order of the simplices according to these values provide a bi-filtration over the grid [0, 1] × [0, 1]. ∂(I ′ ) σ 1 p ∂(I) σ 2 σ 4 σ 3 ℓ = 2 s(σ 1 ) = (0, 2) s(σ 1 ) = (0, 2) s(σ 2 ) = (-2, 0) Choosing center points for worms. Let us denote the chosen grid as Grid = { m M , n M | m, n ∈ {0, 1, . . . , M }} for some M ∈ Z + . We denote the grid resolution as ρ ≜ 1/M . We sample a uniform subgrid P ⊆ Grid as the collection of center points for the worms to be used to build GRIL. Discretized ℓ-worms. We saw the definition of ℓ-worm in the previous section. However, in practice, since we work with a discrete grid rather than R 2 , we use discretized ℓ-worms as an approximation. The approximation gap is determined by the grid resolution ρ. A discretized ℓ-worm centered at p with width d is the union of 2ℓ -1 squares with centers at p + (kd, -kd) and p -(kd, -kd) where k ∈ {0, 1, . . . , ℓ -1} along with the intermediate staircases between two squares of stepsize equal to grid resolution (ρ). Figure 2 (middle) shows the discretization of a 2-worm. This construction is sensitive to the grid resolution. Computing generalized ranks. We need to compute the generalized rank rk M ( p 

D EXPERIMENTAL SETUP D.1 HOURGLASS DATASET

The two traversals T 1 and T 2 are designed as follows: T 1 traverses G 1 , then followed by G 2 ; T 2 traverses upper halves G ⊤ 1 ⊆ G 1 and G ⊤ 2 ⊆ G 2 sequentially first, then followed by the other halves G ⊥ 1 ⊆ G 1 and G ⊥ 2 ⊆ G 2 . For cross edges, we randomly pick 2|V | pairs of nodes (with replacement) in G ⊤ 1 × G ⊥ 2 on which we place cross edges. We don't place multiple edges on the same pair of nodes. In a similar way we place cross edges on G ⊥ 1 × G ⊤ 2 . Therefore, G has roughly 6|V | cross edges between G 1 and G 2 . The (roughly) total number of edges: |E| ≈ 5|V |. For methods based on persistence modules, we take two filtration functions f 1 , f 2 : V ∪ E → R on G as follows: let x(v) be the node attribute on v given by the order index of the trace. Then • f 1 is given by ∀v ∈ V, f 1 (v) = x(v) and ∀e = (v, w) ∈ E, f 1 (e) = max(x(v), x(w)). • f 2 is given by f 2 (v) = 0 and f 2 = C(e) where C(e) is a curvature value of e. Here we use a version of discrete Ricci called Forman-Ricci curvature (Forman, 2003) computed by the code provided in (Ni et al., 2019) . We compute for all points p in a uniform 4 × 4 grid the PERSGRIL values λ(p, k, ℓ) for generalized rank k = 1, 2, worm size ℓ = 2, and homology of dimension 0 and 1. Therefore, for each graph our PERSGRIL generates a 64-dimensional vector as representation. For the method based on 1-parameter persistence modules with persistence image vectorization, we compute 1-parameter persistence modules for homology dimension 0, 1 on f 1 and f 2 independently. Each persistence module will be vectorized on a 4 × 4 grid. Therefore, it also produces a 64-dimensional vector as representation. For graph neural networks models, we use 3-layer GCN and 3-layer GNN with fixed hidden dimension to be 16, followed by sum pooling and one fully-connected layer. We use 3-layer multilayer perceptron (MLP) with fixed hidden dimension to be 16 as classifiers for all models. We train all the models 100 epochs with cross entropy loss and Adam optimizer (Kingma & Ba, 2015) with learning rate fixed to be lr=0.001. We do 5-fold cross validation and report the mean accuracy and standard deviation.

D.2 GRAPH EXPERIMENTS

We performed a series of experiments on graph classification using GRIL. We used standard datasets with node features such as PROTEINS, DHFR, COX2 and MUTAG (Morris et al., 2020) . Description of the graph classification tasks is given in Table 3 . The node features of all the nodes were treated as points in a higher dimensional space and we computed the Density-Alpha bi-filtration on the nodes. We extended the filtration on the edges by considering the maximum of the values on the corresponding nodes. The Density-Alpha bi-filtration and the Heat Kernel Signature-Ricci Curvature bi-filtration, as done in (Carrière & Blumberg, 2020) , values are normalized so that they lie between 0 and 1. For the experiments reported in 4, we fix the grid resolution ρ = 0.01. Thus, the square [0, 1] × [0, 1] has 100 × 100 many grid points. We sample 128 center points, p, out of these grid points uniformly. We fix l = 2 for our experiments. We compute λ(p, k, ℓ) where p varies over the sampled 128 center points and k varies from 1 to 10. Each such computation is done for dimension 0 homology (H 0 ) and dimension 1 homology (H 1 ). We fix the value of learning rate as 0.001 for the experiments. Dataset ρ = 0.02 ρ = 0.01 ρ = 0.005 MUTAG 87.9 ± 8.1 87.8 ± 8.8 87.9 ± 8.1 Table 4 : 10-fold cross-validated test accuracy for different grid resolutions. In Table 5 , we provide a study of the performance of GRIL on different values of k on MUTAG and COX2 datasets. We compare the 10-fold cross-validated test accuracy of GRIL on Density-Alpha bi-filtration. For this study, we use a 1-layer MLP classifier and we fix the learning rate to be 0.001. The columns in Table 5 represent the values of k chosen. For instance, [1 -2] represents that we computed λ(p, 1, ℓ) and λ(p, 2, ℓ) and concatenated these vectors before passing them to the 1-layer MLP classifier. It seems that for datasets with smaller graphs such as MUTAG, using ranks higher than 6 are not very useful. However, for datasets with comparatively bigger graphs, using higher ranks seems to increase the performance of the model. We report the computation time for computing λ(p, k, ℓ) where l = 2, k ∈ {1, 2, . . . , 10} and p ∈ P where |P| = 128 in Table 6 . The GRIL features were calculated on Density-Alpha bi-filtration. The computations were done on a Intel(R) Xeon(R) Gold 6248R CPU machine and the computation was carried out on 32 cores. We report the total computation time per dataset average time it takes for computation time (in seconds) per graph for each dataset. 



the code for full implementation will be available after review process is completed. y-axis represents Rips filtration and x-axis represents density. Density value of a vertex v, denoted by γv, is defined as 1 -exp(Avg. nearest neighbor distance of v). For an edge e := (u, v) and a triangle t := (u, v, w), γ is defined as max(γu, γv) and max(γu, γv, γw) respectively. Rips filtration value of all vertices rv is 0. For an edge e := (u, v), the Rips filtration value ruv is the Euclidean distance between u and v. For a triangle t := (u, v, w), the filtration value rt = max(ruv, ruw, rvw) is the maximum over all its edges. The filtration value is rounded off to the nearest hundredth decimal place for visualization purposes. https://github.com/taohou01/fzz This work presents GRIL, a 2-parameter persistence vectorization based on generalized rank invariant which we show is stable and differentiable with respect to the bi-filtration functions. Further, we present an algorithm for computing GRIL which is a synergistic confluence of the recent developments in computing generalized rank invariant of a 2-parameter module and an efficient algorithm



Figure 1: (left) 1-parameter filtration and bars; (right) a 2-parameter filtration inducing a 2-parameter persistence module whose decomposition is not shown.

2 (left) for an illustration of a 2-worm example. We now define GRIL.

Figure 2: A 2-worm, discretized 2-worm and expanded discretized 2-worm. ρ denotes grid resolution. The blue dotted lines show the intermediate staircase with step-size ρ. The red dotted lines form parts of the squares with size d which are replaced by the blue dotted lines in the worm. The last figure shows the expanded 2-worm with red and blue dotted lines. The expanded 2-worm has width d + ρ which is the one step expansion of the worm with width d.

Figure3: The construction starts from a simplicial complex with a bi-filtration function as shown on the top right. The simplicial complex consists of two vertices connected by one edge. Based on the bi-filtration, a simplicial bi-filtration can be defined as shown on the top left. On the bottom left, a 2-parameter persistence module is induced from the above simplicial filtration. If we check the dimensions of the vector spaces on all points of the plane, there are 1-dimensional vector spaces on red, blue and light purple regions. On the L-shaped dark purple region, the vector spaces have dimension 2. Finally, on this 2-parameter persistence module, we calculate λ M f (p, k, ℓ) for all tuples (p, k, ℓ) ∈ P × K × L to get our GRIL vector representation. By Defintion 2.1 the value λ M f (p, k, ℓ) corresponds to the width of the supremum ℓ-worm on which the generalized rank is at least k. On the bottom right, the interval in red is the maximal 2-worm for λ M f (p, k = 1, ℓ = 2). The green interval is the maximal 2-worm for λ M f (q, k = 2, ℓ = 2). The yellow square is the maximal 1-worm for λ M f (r, k = 1, ℓ = 1), and the blue interval is the maximal 3-worm for λ M f (r, k = 1, ℓ = 3).

Figure 5: (Left) An example of a graph consisting of two circulant subgraphs. The pair of indices on each node represents the its order on the traversals T 1 and T 2 respectively. Both traversals start from the left node as the root node. (Right) Cross edges placed across two subgraphs.

Figure6shows the result after running PERSGRIL for 200 epochs. The optimizer we use to optimize the loss function is Adam(Kingma & Ba, 2015) with a learning rate of 0.01.

Figure6: The figures show the rearrangement of points according to the loss function, which in our case is increasing the norm of λ 1 and λ 2 vectors. We start with two circles containing some noisy points inside. We observe that the points rearrange to form two circles because that increases the norm of λ 1 and λ 2 vectors.

2. Given a persistence module M and ϵ ∈ R, we define the shift module M ←ϵ through M ←ϵ u = M u+ϵ and M ←ϵ u→v = M u+ϵ→v+ϵ . Here u + ϵ = (u 1 + ϵ, u 2 + ϵ). Definition B.3. For a pair of persistence module M and N and some ϵ ∈ R ≥0 , an ϵ-interleaving between M and N is a pair of morphisms ϕ : M → N ←ϵ and ψ : N → M ←ϵ such that ∀u ∈ R 2 , M u→u+2ϵ = ψ u+ϵ • ϕ u and N u→u+2ϵ = ϕ u+ϵ • ψ u . If such interleaving exists, we say M and N are ϵ-interleaved. Definition B.4. For two persistence modules M and N , we define the interleaving distance as d I (M, N ) ≜ inf ϵ≥0 {M and N are ϵ-interleaved}. Definition B.5. For persistence module M, N with GRILs λ M , λ N , define

To show d W E ≤ d L . Let d L (M, N ) = δ. For any I = p ℓ ϵ ∈ I, we want to show that rk M ( p We prove the first inequality. The second one can be proved in a similar way. Let k = rk N ( p ℓ ϵ+δ

parameter persistence module induced from some generic filtration function f ∈ R and I = p ℓ d be an ℓ-worm in R 2 centered at some point p. Let ∂(I) be the boundary of I excluding the right most vertical edge and bottom most vertical edge (See Figure 7 as an illustration). It is shown in (Dey et al., 2022) that, over the boundary ∂(I), a so-called zigzag persistence module can be defined by taking the restricting M to ∂(I) (in practice it is enough to take a zigzag path to approximate the smooth off-diagonal boundary) on which the number of full bars is equal to rk M (I). Let I ′ = p ℓ d ′ be another ℓ-worm centered at p for some d ′ ̸ = d. One can observe that, if the zigzag filtrations on ∂(I) and ∂(I ′ ) have the same order of insertion and deletion of simplices , then the number of full bars on M | ∂(I) and M | ∂(I ′ ) are the same, which means rk M (I) = rk M (I ′ ). Now let d = λ M (k, ℓ, p), I = p

collection of simplices and orders as M f | I with I = p ℓ d , which means they have the same rank. Therefore, we have ∀∥g∥

to change Λ(f ) by ∆d one has to at least move edge e t by ∆y et , which correspondingly changes the y-coordinate of s(σ) by ∆y et . From the above argument, we can get the directional derivative ∇ g Λ(f ) is bounded from above by the ratio ∆d ∆ye t = 1 ℓ = ∇ s Λ(f ). The case for α < 0 is symmetric.

Figure 7: Two examples of 2-worm I, I ′ . Blue and red lines are boundaries of I and I ′ respectively on which the zigzag persistence modules are constructed for computing ranks. σ i , i = 1, 2, 3, 4 are four support simplices on ∂(I). s(σ i ) is the assignment function values on σ i .

Figure 8: (Left) The figure shows the 2-worm centered at p with width d. (Right) The highlighted part denotes the boundary cap of the worm. The arrows in the figure denote the direction of arrows in the zigzag filtration.

1 ± 9.0 86.8 ± 7.5 88.4 ± 8.4 87.8 ± 8.8 87.8 ± 8.8 COX2 78.7 ± 0.0 78.3 ± 1.2 79.6 ± 3.0 80.4 ± 2.2 80.6 ± 2.5 Table 5: 10-fold cross-validated test accuracy of Gril D-Alpha for different values of k.

Table of testing results from different models. Last two rows show the testing results on HourGlass[10,20] and HourGlass[31,40] of models trained on HourGlass[21,30].

The full details of the experiments are given in Appendix D.2 10-fold cross validation test accuracy of different models on graph datasets. The values of the MP-I, MP-K, MP-L columns are as reported in

Description of Graph Datasets

Computation times for GRIL features on Density-Alpha bi-filtration for graph datasetsIn Table4, we report the accuracy of GRIL on Density-Alpha bi-filtration with different grid resolutions on the MUTAG dataset.

acknowledgement

Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete & Computational Geometry, 33(2):249-274, Feb 2005. ISSN 1432-0444. doi: 10.1007/s00454-004-1146-y. URL https://doi.org/10.1007/s00454-004-1146-y.

