DYHCN: DYNAMIC HYPERGRAPH CONVOLUTIONAL NETWORKS

Abstract

Hypergraph Convolutional Network (HCN) has become a default choice for capturing high-order relations among nodes, i.e., encoding the structure of a hypergraph. However, existing HCN models ignore the dynamic evolution of hypergraphs in the real-world scenarios, i.e., nodes and hyperedges in a hypergraph change dynamically over time. To capture the evolution of high-order relations and facilitate relevant analytic tasks, we formulate dynamic hypergraph and devise the Dynamic Hypergraph Convolutional Networks (DyHCN). In general, DyHCN consists of a Hypergraph Convolution (HC) to encode the hypergraph structure at a time point and a Temporal Evolution module (TE) to capture the varying of the relations. The HC is delicately designed by equipping inner attention and outer attention, which adaptively aggregate nodes' features to hyperedge and estimate the importance of each hyperedge connected to the centroid node, respectively. Extensive experiments on the Tiigo and Stocktwits datasets show that DyHCN achieves superior performance over existing methods, which implies the effectiveness of capturing the property of dynamic hypergraphs by HC and TE modules.

1. INTRODUCTION

Graph Convolutional Network (GCN) Scarselli et al. (2008) extends deep neural networks to process graph data, which encodes the relations between nodes via propagating node features over the graph structure. GCN has become a promising solution in a wide spectral of graph analytic tasks, such as relation detection Schlichtkrull et al. (2018) and recommendation Ying et al. (2018) . An emergent direction of GCN research is extending the graph covolution operations to hypergraphs, i.e., hypergraph convolutional networks Zhu et al. (2017) ; Zhou et al. (2007) ; Zhang et al. (2017) ; Feng et al. (2019b) ; Yadati et al. (2019) , where high-order node relations are represented as hyperedges (one hyperedge can connect multiple nodes). For instance, in a hypergraph of stocks, an financial event relevant to several stocks is represented as a hyperedge. While a surge of attention paid on hypergraph convolutional networks, most of them discard the dynamic property of hypergraphs in real-world applications, e.g., new hyperedges (i.e., events) emerge in the hypergraph of stocks (see Fig. 1 ), where the evolution of the hypergraph is crucial for the analytic tasks (e.g., stock price prediction). Aiming to bridge the gap, this work explore the central theme of dynamic hypergraph and the corresponding GCN. Formally, a hypergraph with n nodes and m hyperedges is represented as G = (V, E, A, H, X) where V and E denote the set of nodes and hyperedges respectively; A ∈ R n×m is an incidence matrix with binary value indicating the connectedness of nodes; H ∈ R m×c and X ∈ R n×d are features represent the hyperedges and nodes respectively. In order to account for the evolution, we first extend the concept of static hypergraph to dynamic hypergraph, which has two different formulations when treating the time as continuous value or discrete value. 1) Discrete-time formulation. A straightforward solution is to treat a time window with length of T (e.g., T days) as a sequence of time-steps and get a snapshot at each time-step. In this way, a dynamic hypergraph is defined as G D = [G 1 , • • • , G t , • • • , G T ] T where G t is a hypergraph dumped at time-step t. 2) Continuous formulation. By treating time as a continuous variable, the dynamic hypergraph can be defined as G C = (G 0 , U ) where G 0 is the initial status (a hypergraph) and U = {(p t , v t , a t )|t <= T } is a streaming of updates. p t denotes the target variable (e.g., a row of X) changed at time t; v t denotes the latest value of the target variable, a t denotes the action of change, including add, delete, update. It should be noted that both formulations have pros and cons, e.g., the discrete-time formulation is more friendly to existing analytic techniques on static hypergraph such as HCN while the continuous-time formulation records the accurate time of changes. This work focuses on the discrete-time formulation and makes the first attempt to extend HCN to dynamic hypergraph. A big challenge of capturing spatial-temporal dependency in a dynamic hypergraph is that it is tough to extract the features of those changing nodes or hyperedges in a unified manner for the sake of varied scales of nodes and hyperedges. Besides, how to absorb their dynamic properties is very important for various application tasks. Towards this end, we need to design the proper convolution operations on dynamic hypergraph. There are two challenging toughs: 1) at each time step, since there are various relations between hyperedges and nodes, it is important to update the node features by considering various relations in the hyperedges; 2) due to dynamically changes of the node features, modeling the temporal dependency needs to extract the corresponding temporal features. In this work, we propose a framework of Dynamic Hypergraph Convolutional Networks (DyHCN) to tackle the challenges, which has two modules: Hypergraph Convolution (HC) module and Temporal Evolution (TE) module. In a dynamic hypergraph, the set of hyperedges at each time step includes different hyperedge embeddings and each hyperedge contains different numbers of nodes. We exploit three submodules to update an node's embeddings in HC: inner attention, outer attention, and embeddings update. Firstly, inner attention transform node features along with its hyperedge into the node-hyperedge feature; and then outer attention utilizes attention mechanism to estimate the importance of each hyperedge and output the importance weights; and then we update the node's embeddings by aggregating node-hyperedge, hyperedge and node features with the weight of each hyperedge. Getting the nodes embeddings, we extract temporal features of nodes' embeddings and make a prediction by the TE module. Extensive experimental results on two real-world datasets validate the superior performance of DyHCN over the existing baselines which proves the effectiveness of DyHCN on dynamically hypergraphs. The rest of the paper is organized as follows. Section 2 introduces the preliminary knowledge about GCN and the hypergraph convolutional network. Section 3 explains the proposed DyHCN method. Section 4 introduces related work about GCN on the graph and hyperedge. Applications and experimental results are presented in Section 5. Finally, we conclude this work in Section 6.

2. PRELIMINARY

Graph Convolutional Network Given a graph G = (V, E) with N nodes v i ∈ V, edges (v i , v j ) ∈ E, an adjacency matrix A ∈ R N ×N and a degree matrix D ii = j A ij . With the input signal x, Kipf & Welling (2016) considers spectral convolutions on graphs with a filter g θ = diag(θ) in the Fourier domain, g θ x = Ug θ U T x, where U is the matrix of eigenvectors of the normalized graph Laplacian L = I N -D -1/2 AD -1/2 = UΛU T , with a diagonal matrix of eigenvalues Λ and the graph Fourier transform U T x. In order to reduce the computation complexity, g θ is approximated with Chebyshev polynomials Defferrard et al. (2016) , which can be formulated as: T k (x) = 2xT k-1 (x) -T k-2 (x) g θ ≈ K k=0 θ k T k ( Λ), where Λ = 2 λmax Λ -I, λ max denotes the largest eigenvalue of Laplacian matrix, θ k denotes the Chebyshev coefficients. Kipf & Welling (2016) proved that the GCN can be simplified to K =1 and λ max ≈ 2, which is the state-of-the-art of GCN. Hypergraph Convolutional Network A hypergraph can be formulated as G = (V, E, W), where V is a set of vertes, E is a set of hyperedges and W is a diagonal matrix which denotes the weight of each hyperedge. The adjacency matrix of hypergraph G can be denoted by H ∈ R |V|×|E| . The degree of node is d (v) = e∈E w(e)h(v, e) and the degree of edge δ(e) = v∈V h(v, e). D e and D v denotes the matrices of edge degrees and node degrees. The spectral convolution of x and filter g can be formulated as g x = Φ((Φ T g) (Φ T x)) = Φg(Λ)Φ T x , where denotes the element-wise multiplication and g(Λ) is a function of Fourier coefficients Feng et al. (2019b) . As simplified in GCN, the convolution operation can be simplified to g x ≈ θD -1/2 v HWD e -1 H T D -1/2 v x. 3 DYNAMIC HYPERGRAPH CONVOLUTIONAL NETWORKS

3.1. FORMULATION OF DYNAMIC HYPERGRAPH

Dynamic hypergraph can be formulated into two categories: discrete-time and continuous-time dynamic hypergraph. The discrete-time approach views dynamic hypergraph as a collection of static graph snapshots over time, while the continuous counterpart extracts fine-grained temporal information on nodes and hyperedges which characterize the dynamic evolution of hypergraph. Discrete-time Dynamic Hypergraph Discrete-time dynamic hypergraph can be formulated as G D = (V t , E t , A t , H t , X t ), where X t = [x t 1 , x t 2 , • • • , x t n ] T ∈ R n×d , H t = [h t 1 , h t 2 , • • • , h t m ] T ∈ R m×c , x t i (i = 1, 2, • • • , n) denotes the feature of the i -th node and h t j (j = 1, 2, • • • , m ) denotes the feature of the j-th hyperedge, and m, n is the number of hyperedges and nodes on hypergraph G t (hypergraph on time step t). A t ∈ R n×m is an incidence matrix with binary value indicating the connectedness of nodes on hypergraph G t . V t is the set of nodes, E t is the set of hyperedges. C t e = [u t 1 , u t 2 , • • • , u t k t e ] T ∈ R k t e ×d and D t u = [e t 1 , e t 2 , • • • , e t k t u ] T ∈ R k t u ×c are used to denote the node set contained in hyperedge e and the hyperedge set containing the node u at time setp t respectively. Note that k t e and k t u are the number of nodes in hyperedge e and the number of hyperedges containing node u on time t, respectively. As the representation evolve over time, we capture the spatial dependency by hypergraph convolutional networks and use CNNs to model the temporal dependency.

Continuous-time Dynamic Hypergraph

Continuous-time dynamic hypergraph can be defined as G C = (G 0 , U ) where G 0 is the initial status (a hypergraph) and U = {(p t , v t , a t )|t <= T } is a streaming of updates. p t denotes the target variable (e.g., a row of X) changed at time t; v t denotes the latest value of the target variable, a t denotes the action of change, including add, delete, update. Due to a static hypergraph model can be extended to dynamic hypergraphs by applying it on each snapshots and then aggregating the results of the model, and the distinction between an evolving and a temporal network is less important Skarding et al. (2020) , we adapt discrete-time dynamic hypergraph to build the DyHCN model in our work. DyHCN DyHCN is composed of two modules: hypergraph convolution (HC) and temporal evolution (TE). The HC module is designed to aggregate features among nodes and hyperedges with attention mechanisms and update the embeddings of centroid nodes. The TE module is used for capturing dynamic changes in temporal features. The framework of DyHCN is illustrated in Fig. 2 ,

3.2. HYPERGRAPH CONVOLUTION

Hypergraph convolution consists of three submodules: inner attention, outer attention, and embeddings update. In particular, inner attention aggregates nodes' features to hyperedge, outer attention uses attention mechanisms to determine the importance of each hyperedge, and embeddings update submodule aggregates node-hyperedge features, hyperedge features and the node features to update centroid node embeddings with the weight of each hyperedge. 

Inner attention

The inner attention is shown on the left plane of Fig. 3 which aggregates node embeddings to node-hyperedge features by using a self-attention mechanism. With a multi-layer perceptron (MLP) we can get the weight score of each node. For a specific node x t i on time step t, the input of inner attention is C t e = [u t 1 , u t 2 , • • • , u t k t e ] T ∈ R k t e ×d and the output of node-hyperedge embedding d t is the weighted sum of node features, which is formulated as: ω t = softmax (C t e w e + b e ), d t = k t e j=0 ω t j u t j , where w e ∈ R d×1 and b e ∈ R k t e ×1 are trainable parameters, ω t ∈ R k t e ×1 is the weight of nodes in hyperedge, d t ∈ R 1×d denotes the node-hyperedge features, and k t e denotes the number of nodes in hyperedge, d is node feature dimension. Outer attention Due to multiple hyperedges related to center node, and the importance of each hyperedge is different, we propose an outer attention submodule to determine the weight of each hyperedge. The right plane of Fig. 3 shows the outer attention submodule which calculates the weight of each hyperedge based on hyperedge features. For specific node x t i , the input of outer attention is D t u = [e t 1 , e t 2 , • • • , e t k t u ] T ∈ R k t u ×c , a hyperedge set containing vertex x t i , and the output is ω t h , the weight of each hyperedge on time step t. r t u = sigmoid(D t u w u + b u ), ω t h = softmax (r t u ), where w u ∈ R c×1 , b u ∈ R k t u ×1 are trainable parameters and ω t h ∈ R k t u ×1 is the weight vector of each hyperedge, k t u is the number of hyperedges containing vertex u at time step t, and c is the hyperedge feature dimension. Embeddings Update With the output of inner attention and out attention, we update the centroid node embeddings s t i by aggregating node's input features x t i , node-hyperedge features d t and hyperedge features h t i with the weight of hyperedges ω t h . We explore three aggregation methods, 1) Concatenated features We concatenate the node-hyperedge features and hyperedge features directly with the activation funciton of tanh, q t = tanh[d t : (d+c) . 2) Dot-product features We multiply the node-hyperedge features with hyperedge features with the element-wise operation to model the interaction of two kinds features, tanh, R 1×d (by setting d=c), where denotes element-wise product operation. 3) MLP features We concatenate the node-hyperedge features with hyperedge features with an MLP process to aggregate the features, q t = tanh([d t : h t i ] ∈ R 1× q t = tanh[d t h t i ] ∈ h t i ]W c + b c ) ∈ R 1×d , where W c ∈ R (d+c)×d , b c ∈ R 1×d are trainable parameters. Note that, h t c only stands for the concatenated features for one hyperedge, so for k t u hyperedges, we can get a concatenated features matrix Q t i = [q t 0 , q t 1 , • • • , q t k t u ] T which denotes the influence from nodes and each hyperedge. Considering the weight of each hyperedge ω t h , we first calculate the weighted sum of concatenated features Q t i to measure the influence from all hyperedges and related nodes. And then update the specific node embedding s t i with the input feature x t i and the influence embeddings. z t i = sum(ω t h • Q t i ), s t i = tanh([x t i : z t i ]W h + b h ), where z t i ∈ R 1×d is the weighted aggregated features, W h ∈ R 2d×d and b h ∈ R 1×d are trainable parameters.

3.3. TEMPORAL EVOLUTION

The centroid node embeddings extracted by HC are independent on different time steps, we will get embeddings for each centroid node i along with time, i.e., S i = [s 0 i , s 1 i , • • • , s t i ] T . We adopt a temporal evolution module to process temporal information extraction. The TE module utilize the LSTM model to extract temporal features which can be used for classification or regression tasks. O i = LST M (S i ), (7) ŷi = (tanh(O i W o + b o ))W y + b y , where O i ∈ R 1×dim is the temporal features extracted by LSTM, dim is the hidden dimension of LSTM. The graph-based learning limits the relationships into pairwise, however, in many applications, the relations between objects are in higher-order that cannot be formulated by a graph structure. W o ∈ R dim×k , b o ∈ R 1×k , W y ∈ R k×l , b y ∈ R 1×l are GCN on hypergraph To evaluate the higher-order relations between nodes, Zhou et al. (2007) introduced the first hypergraph learning, where multiple nodes share the same hyperedge. On the 2018) is the first dynamic hypergraph structure learning method that optimizes the label projection matrix and the hypergraph structure itself simultaneously. But DHSL fails to exploit high-order relations among features Jiang et al. (2019) . DHGCN Jiang et al. (2019) proposed a stacked layers framework to evaluate the dynamic hypergraph with KNN to build the dynamic hyperedges. However, the input of DHGCN is fixed, which means the relations among nodes are fixed and the hypergraph structure just update on each layer. But in the real world, the relations of nodes may be connected temporarily, and existing models cannot process temporary connections or change connections among different nodes.

5. EXPERIMENTS

The DyHCN model can be applied to various tasks that can be formulated as dynamic hypergraph. In our work, we adopt DyHCN with news and social comment datasets for stock price prediction.

5.1. EXPERIMENTAL SETTING

Tiingo datasetfoot_0 . The dataset covers the news content, stocks contained in the news, and the release time of news. On a specific trading day, there are varity news and each news many contains different numbers of stocks, so we construct a hypergraph with news as hyperedges and stocks as nodes. We construct the dynamic hypergraph based on crawled news from June 22, 2016 to June 23, 2019, a total of 756 trading days with one hypergraph on each trading day. Inspired by Chen et al. (2019) , we adapt Finance Event Dictionary (TFED) to extract fine-grained events, and pick out the most activate 91 stocks on market for price prediction.

Stockwits dataset

The Stocktwits dataset is a stock comment dataset which can be crawled from the web of stockwitsfoot_1 . The dataset covers the stock comment content, stocks mentioned in the comment, and the comment time. On a specific trading day, we construct a hypergraph with comments as hyperedges and stocks as nodes. We pick out 91 stocks with the highest market value in different industries on S&P 500 for price prediction, and collect the data from Aug. 7, 2014 to Aug. 20, 2018, a total of 1015 trading days with one hypergraph on each trading day. The details of datasets are shown on Table 1 . With the construction of dynamic hypergraph, we assign the nodes featuress with the hidden embedding of price and volume extracted by LSTM, and hyperedges featrues with the embedding represented by GloVe Pennington et al. (2014) . The feature dimension of hyperedges and nodes are set to 50. The training, validation, and testing sets are separated as in Table 1 . To measure the result Baselines To evaluate the result of our proposed DyHCN model, we compare the experiment result with the traditional time series, NLP-based, graph-based and hypergraph-based model: 1) DA- RNN Hsu et al. (2009) One of the state-of-the-art models for time series prediction. 2) HAN Hu et al. (2018) The representations of the NLP model for stock price prediction. 3) RSR Feng et al. (2019a) The state-of-the-art graph-based model for price prediction. 4) DHGCN Jiang et al. (2019) The hypergraph-based model for prediction. Because the model RSR and DHGCN are designed for static graph/hypergraph, we present the RSR and DHGCN for daily price prediction. To compare with baseline models, we use DyHCN with stacked layer HC and TE module for price prediction, and add a dropout layer with the dropout rate of 0.5 before TE module. We set the learning rate 0.005 and training epoch 1000.

5.2. RESULTS AND ANALYSIS

We report the performance of all methods in Table 2 , Table 3 and Fig. 4 . From Table 2 , we have the following observations: 1) Compared with DA-RNN, the MAE and MAPE scores of HAN decreases by 21.56% and 36.96% from 0.1869, 0.4786 to 0.1466, 0.3017, while the MSE increases by 17.34% from 0.0467 to 0.0548, indicating that the extra features such as Tiingo or Stocktwits comment data are useful for stock price prediction. 2) The MAE, MAPE and MSE results of RSR outperform HAN, and decrease by 35.47%, 13.95% and 66.61% respectively, indicating that the consideration of relations between different stocks would improve the performance of the prediction result. 3) Comparing the graph-based model RSR with hypergraph-based model DHGCN, there is no significant difference in the performance of the model. However, in the Tiingo dataset, the results of RSR decrease by 3.86%, 5.74%, 15.28% comparied with DHGCN respectively, while in the Stocktwits comment dataset, the model shows the opposite result, the three metrics of RSR increase by 5.68%, 4.94%, 1.28% comparied with DHGCN. This shows that the performance of RSR and DHGCN models are not stable on different datasets.

4)

The results of DyHCN outperform RSR and DHGCN. On Tiingo dataset, the MAE decreases by 7.72%, 11.28%, the MAPE decreases by 2.43%, 8.02%, and the MSE decreases 12.57%, 25.93% compared with RSR and DHGCN respectively. On Stocktwits comment dataset, the MAE decreases by 18.03%, 13.37%, the MAPE decreases by 11.23%, 6.85%, and the MSE decreases by 25.32%, 24.36% compared with RSR and DHGCN respectively. The result shows that in both Tiingo and Stocktwits comment datasets, the performance of DyHCN would keep stable, and with the consideration of dynamic information, the performance is better than static graph/hypergraph based model.

5)

Comparing DyHCN with DA-RNN, HAN, RSR and DHGCN, the average loss of MAE, MAPE and MSE decrease by 55.37%, 42.43%, 7.57% and 15.08% on Tiingo dataset respectively, and 71.55%, 61.29%, 18.19%, 14.86% on Stocktwits comment dataset respectively. To test the stability and the scalability of the model, we evaluate different feature aggregation methods. Fig. 4 shows the performance of different feature aggregation methods with a hidden size of 16, 32, 64, and 128 on TE module. Comparing the results, the MLP feature concatenate method reminds stable and outperforms the cat and multi-feature aggregate methods. Besides, with the comparison Under review as a conference paper at ICLR 2021 of results with different hidden size, the performance has no significant difference, indicating that the hidden size of LSTM is not the major factor for model prediction. In addition to the comparison with baselines above, we also evaluate the effectiveness of submodules including inner attention, outer atteniton, HC and TE. We evaluate the effectiveness of each module which shown in Table 3 , we use DyHCN without inner attention, DyHCN without outer attention and DyHCN without TE to evaluate the effectiveness of inner, outer attention and TE module. Also, we use HGCN model Feng et al. (2019b) which aggregate node features on static hypergrap to replace HC module to evaluate the effectiveness of HC module. The result shows that, on both datasets, the performance of DyHCN are better than DyHCN without inner, outer and TE module. The inner and outer attention module determines the importance of each message, and passes the corresponding information to the centroid which would be used for prediction more acurrate. The TE module considers the impact of previous information and extracts the temporal features from series embeddings, which would be better than from individual time step. In addition, the performance of HGCN Feng et al. (2019b) with TE model is also worse than DyHCN, while HGCN works well on static hypergraph, indicating that DyHCN is more suitble for dynamic hypergraph tasks.

6. CONCLUSION

In this paper, we proposed a framework of dynamic hypergraph convolutional networks (DyHCN), which consists of hypergraph convolution (HC) and temporal evolution (TE) module. The HC is delicately designed by equipping inner attention and outer at-tention, which adaptively aggregate nodes' features to hyperedge and estimate theimportance of each hyperedge connected to the centroid node, respectively. And then update the centroid node embeddings by aggregating various related features. The TE captures the long-range temporal information of dynamic hypergraph features. Based on the two modules, the DyHCN get the dynamic relations between different nodes with the dynamic weight of hyperedges on different time steps. DyHCN can be used for various tasks that can be formulated as dynamic hypergraph, and extensive experiments on the newly constructed Tiingo and Stocktwits comment datasets show that our proposed DyHCN outperforms the state-of-the-art model for the modeling of dynamic hyper-order relations.



https://www.tiingo.com https://stocktwits.com



Figure 1: The evolution of dynamic hypergraph.

Figure 2: The framework of DyHCN: HC module consists of inner attention, outer attention and embeddings update submodules, which aggregates various features to centroid vertex, and TE module extracts temporal features for prediction.

Figure 3: Inner attention on the left and outer attention on the right.

Figure 4: Performance of different features aggregate methods on Tiingo and Stocktwits social comment datasets with the hidden size of 16, 32, 64 and 128 of TE. Table 3: Performance on Tiingo and Stocktwits dataset with hidden size 128 of TE Dataset Tiingo Stocktwits Models MAE MAPE MSE MAE MAPE MSE DyHCN(no inner) 0.1303 0.3988 0.0347 0.0745 0.1667 0.0122 DyHCN(no outer) 0.0887 0.2613 0.0165 0.0732 0.1695 0.0120 DyHCN(no TE) 0.0982 0.2754 0.0218 0.0842 0.1784 0.0160 HGCN(with TE)Feng et al. (2019b) 0.2943 0.8025 0.1771 0.2943 0.8025 0.1771 DyHCN 0.0873 0.2533 0.0160 0.0732 0.1660 0.0118

trainable parameters, k is the hidden dimension size for MLP and l is the final output size which determined by detail task.

Details of Tiingo and Stockwits datasets Laplacians which constitute the basis of new spectral hypergraph clustering methods. Chan et al. (2018) considered a stochastic diffusion process and introduces a new hypergraph Laplacian operator generalizing the Laplacian matrix of graphs. Yadati et al. (2019) simplified hypergedges into simple edges with mediators and demonstrate the effectiveness through detailed experiments. The other way, spatial methods, Gilmer et al. (2017) raised a Message Passing Neural Networks (MPNN) framework which learns a message-passing algorithm and aggregate features for node representation. Feng et al. (2019b) introduced the first hypergraph deep learning method hypergraph neural network (HGNN). However, most of the existing works focus on static hypergraph structure which has little effort on optimizing the hypergraph structure during the learning process. DHSL Zhang et al. (

Performance on Tiingo and Stocktwits dataset with hidden size 128 of TE

