LIGHTGCL: SIMPLE YET EFFECTIVE GRAPH CON-TRASTIVE LEARNING FOR RECOMMENDATION

Abstract

Graph neural network (GNN) is a powerful learning approach for graph-based recommender systems. Recently, GNNs integrated with contrastive learning have shown superior performance in recommendation with their data augmentation schemes, aiming at dealing with highly sparse data. Despite their success, most existing graph contrastive learning methods either perform stochastic augmentation (e.g., node/edge perturbation) on the user-item interaction graph, or rely on the heuristic-based augmentation techniques (e.g., user clustering) for generating contrastive views. We argue that these methods cannot well preserve the intrinsic semantic structures and are easily biased by the noise perturbation. In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues impairing the generality and robustness of CL-based recommenders. Our model exclusively utilizes singular value decomposition for contrastive augmentation, which enables the unconstrained structural refinement with global collaborative relation modeling. Experiments conducted on several benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the superiority of LightGCL's robustness against data sparsity and popularity bias. The source code of our model is available at https://github.com/HKUDS/LightGCL.

1. INTRODUCTION

Graph neural networks (GNNs) have shown effectiveness in graph-based recommender systems by extracting local collaborative signals via neighborhood representation aggregation (Wang et al., 2019; Chen et al., 2020b) . In general, to learn user and item representations, GNN-based recommenders perform embedding propagation on the user-item interaction graph by stacking multiple message passing layers for exploring high-order connectivity (He et al., 2020; Zhang et al., 2019; Liu et al., 2021a) . Most GNN-based collaborative filtering models adhere to the supervised learning paradigm, requiring sufficient quality labelled data for model training. However, many practical recommendation scenarios struggle with the data sparsity issue in learning high-quality user and item representations from limited interaction data (Liu et al., 2021b; Lin et al., 2021) . To address the label scarcity issue, the benefits of contrastive learning have been brought into the recommendation for data augmentation (Wu et al., 2021) . The main idea of contrastive learning in enhancing the user and item representation is to research the agreement between the generated embedding views by contrasting the defined positive pairs with negative instance counterparts (Xie et al., 2022) . While contrastive learning has been shown to be effective in improving the performance of graphbased recommendation methods, the view generators serve as the core part of data augmentation through identifying accurate contrasting samples. Most of current graph contrastive learning (GCL) approaches employ heuristic-based contrastive view generators to maximize the mutual information between the input positive pairs and push apart negative instances (Wu et al., 2021; Yu et al., 2022a; Xia et al., 2022b) . To construct perturbed views, SGL (Wu et al., 2021) has been proposed to generate node pairs of positive view by corrupting the structural information of user-item interaction graph using stochastic augmentation strategies, e.g., node dropping and edge perturbation. To improve the graph contrastive learning in recommendation, SimGCL (Yu et al., 2022a) offers embedding augmentation with random noise perturbation. To work on identifying semantic neighbors of nodes (users and items), HCCF (Xia et al., 2022b) and NCL (Lin et al., 2022) are introduced to pursue consistent representations between the structurally adjacent nodes and semantic neighbors. Despite their effectiveness, state-of-the-art contrastive recommender systems suffer from several inherent limitations: i) Graph augmentation with random perturbation may lose useful structural information, which misleads the representation learning. ii) The success of heuristic-guided representation contrasting schemes is largely built upon the view generator, which limits the model generality and is vulnerable to the noisy user behaviors. iii) Most of current GNN-based contrastive recommenders are limited by the over-smoothing issue which leads to indistinguishable representations. In light of the above limitations and challenges, we revisit the graph contrastive learning paradigm for recommendation with a proposed simple yet effective augmentation method LightGCL. In our model, the graph augmentation is guided by singular value decomposition (SVD) to not only distill the useful information of user-item interactions but also inject the global collaborative context into the representation alignment of contrastive learning. Instead of generating two handcrafted augmented views, important semantic of user-item interactions can be well preserved with our robust graph contrastive learning paradigm. This enables our self-augmented representations to be reflective of both user-specific preferences and cross-user global dependencies. Our contributions are highlighted as follows: • In this paper, we enhance the recommender systems by designing a lightweight and robust graph contrastive learning framework to address the identified key challenges pertaining to this task. • We propose an effective and efficient contrastive learning paradigm LightGCL for graph augmentation. With the injection of global collaborative relations, our model can mitigate the issues brought by inaccurate contrastive signals. • Our method exhibits improved training efficiency compared to existing GCL-based approaches. • Extensive experiments on several real-world datasets justify the performance superiority of our LightGCL. In-depth analyzes demonstrate the rationality and robustness of LightGCL.

2. RELATED WORK

Graph Contrastive Learning for Recommendation. A promising line of recent studies has incorporated contrastive learning (CL) into graph-based recommenders, to address the label sparsity issue with self-supervision signals. Particularly, SGL (Wu et al., 2021) and SimGCL (Yu et al., 2022a) perform data augmentation over graph structure and embeddings with random dropout operations. However, such stochastic augmentation may drop important information, which may make the sparsity issue of inactive users even worse. Furthermore, some recent alternative CL-based recommenders, such as HCCF (Xia et al., 2022b) and NCL (Lin et al., 2022) , design heuristic-based strategies to construct view for embedding contrasting. Despite their effectiveness, their success heavily relies on their incorporated heuristics (e.g., the number of hyperedges or user clusters) for contrastive view generation, which can hardly be adaptive to different recommendation tasks. Self-Supervised Learning on Graphs. Recently, self-supervised learning (SSL) has advanced the graph learning paradigm by enhancing node representation from unlabeled graph data (Zhu et al., 2021a; b; Velickovic et al., 2019; Hassani & Khasahmadi, 2020; Peng et al., 2020; Zhu et al., 2020; Wu et al., 2022) . For example, to improve the predictive SSL paradigm, AutoSSL (Jin et al., 2022) automatically combines multiple pretext tasks for augmentation. Towards the line of contrastive SSL over graph structures, recent efforts focus on designing various graph contrastive learning methods (Yu et al., 2022b; Yin et al., 2022; Zhang et al., 2022; Xia et al., 2022a; Suresh et al., 2021) . For instance, SimGRACE Xia et al. (2022a) proposes to generate contrastive views with the GNN encoder perturbations. In AutoGCL Yin et al. (2022) , graph view generators are designed to be jointly trained with the graph encoder in an end-to-end way. Additionally, GCA (Zhu et al., 2021b) performs both topology-level and attribute-level data augmentation for contrastive view generation. In this method, important edges and features will be identified for adaptive augmentation. GraphCL (You et al., 2020) generates correlated graph representation views using various augmentation strategies, such as node/edge perturbation and attribute masking. 

3.1. LOCAL GRAPH DEPENDENCY MODELING

As a common practice of collaborative filtering, we assign each user u i and item v j with an embedding vector e (u) i , e (v) j ∈ R d , where d is the embedding size. The collections of all user and item embeddings are defined as E (u) ∈ R I×d and E (v) ∈ R J×d , where I and J are the number of users and items, respectively. Following Xia et al. (2022b) , we adopt a two-layer GCN to aggregate the neighboring information for each node. In layer l, the aggregation process is expressed as follows: z (u) i,l = σ(p( Ãi,: ) • E (v) l-1 ), z (v) j,l = σ(p( Ã:,j ) • E (u) l-1 ) i,l and z (v) j,l denote the l-th layer aggregated embedding for user u i and item v j . σ(•) represents the LeakyReLU with a negative slope of 0.5. Ã is the normalized adjacency matrix, on which we perform the edge dropout denoted as p(•), to mitigate the overfitting issue. We implement the residual connections in each layer to retain the original information of the nodes as follows: e (u) i,l = z (u) i,l + e (u) i,l-1 , e (v) j,l = z (v) j,l + e (v) j,l-1 (2) The final embedding for a node is the sum of its embeddings across all layers, and the inner product between the final embedding of a user u i and an item v j predicts u i 's preference towards v j : e (u) i = L l=0 e (u) i,l , e (v) j = L l=0 e (v) j,l , ŷi,j = e (u)⊺ i e (v) j (3)

3.2. EFFICIENT GLOBAL COLLABORATIVE RELATION LEARNING

To empower graph contrastive learning for recommendation with global structure learning, we equip our LightGCL with the SVD scheme (Rajwade et al., 2012; Rangarajan, 2001) to efficiently distill important collaborative signals from the global perspective. Specifically, we first perform SVD on the adjacency matrix A as A = U SV ⊤ . Here, U / V is an I × I / J × J orthonormal matrix with columns being the eigenvectors of A's row-row / column-column correlation matrix. S is an I × J diagonal matrix storing the singular values of A. The largest singular values are usually associated with the principal components of the matrix. Thus, we truncate the list of singular values to keep the largest q values, and reconstruct the adjacency matrix with the truncated matrices as Â = U q S q V ⊤ q , where U q ∈ R I×q and V q ∈ R J×q contain the first q columns of U and V respectively. S q ∈ R q×q is the diagonal matrix of the q largest singular values. The reconstructed matrix Â is a low-rank approximation of the adjacency matrix A, for it holds that rank( Â) = q. The advantages of SVD-based graph structure learning are two-folds. Firstly, it emphasizes the principal components of the graph by identifying the user-item interactions that are important and reliable to user preference representations. Secondly, the generated new graph structures preserve the global collaborative signals by considering each user-item pair. Given the Â, we perform message propagation on the reconstructed user-item relation graph in each layer: g (u) i,l = σ( Âi,: • E (v) l-1 ), g (v) j,l = σ( Â:,j • E (u) l-1 ) However, performing the exact SVD on large matrices is highly expensive, making it impractical for handling large-scale user-item matrix. Therefore, we adopt the randomized SVD algorithm proposed by Halko et al. (2011) , whose key idea is to first approximate the range of the input matrix with a low-rank orthonormal matrix, and then perform SVD on this smaller matrix. Ûq , Ŝq , V ⊤ q = ApproxSVD(A, q), ÂSV D = Ûq Ŝq V ⊤ q (5) where q is the required rank for the decomposed matrices, and Ûq ∈ R I×q , Ŝq ∈ R q×q , Vq ∈ R J×q are the approximated versions of U q , S q , V q . Thus, we rewrite the message propagation rules in Eq. 4 with the approximated matrices and the collective representations of the embeddings as follows: G (u) l = σ( ÂSV D E (v) l-1 ) = σ( Ûq Ŝq V ⊤ q E (v) l-1 ); G (v) l = σ( Â⊤ SV D E (u) l-1 ) = σ( Vq Ŝq Û ⊤ q E (u) l-1 ) where G (u) l and G (v) l are the collections of user and item embeddings encoded from the new generated graph structure view. Note that we do not need to compute and store the large dense matrix ÂSV D . Instead, we can store Ûq , Ŝq and Vq , which are of low dimensions. By pre-calculating ( Ûq Ŝq ) and ( Vq Ŝq ) during the preprocessing stage with SVD, the model efficiency is improved.

3.3. SIMPLIFIED LOCAL-GLOBAL CONTRASTIVE LEARNING

The conventional GCL methods such as SGL and SimGCL contrast node embeddings by constructing two extra views, while the embeddings generated from the original graph (the main-view) are not directly involved in the InfoNCE loss. The reason for adopting such a cumbersome three-view paradigm may be that the random perturbation used to augment the graph may provide misleading signals to the main-view embeddings. In our proposed method, however, the augmented graph view is created with global collaborative relations, which can enhance the main-view representations. Therefore, we simplify the CL framework by directly contrasting the SVD-augmented view embeddings g (u) i,l with the main-view embeddings z (u) i,l in the InfoNCE loss (Oord et al., 2018) : L (u) s = I i=0 L l=0 -log exp(s(z (u) i,l , g i,l /τ )) I i ′ =0 exp(s(z (u) i,l , g (u) i ′ ,l )/τ ) where s(•) and τ stand for the cosine similarity and the temperature respectively. The InfoNCE loss L (v) s for the items are defined in the same way. To prevent overfitting, we implement a random node dropout in each batch to exclude some nodes from participating in the contrastive learning. As shown in Eq. 8, the contrastive loss is jointly optimized with our main objective function for the recommendation task (where ŷi,ps and ŷi,ns denote the predicted scores for a pair of positive and negative items of user i): L = L r + λ 1 • (L (u) s + L (v) s ) + λ 2 • ∥Θ∥ 2 2 ; L r = I i=0 S s=1 max(0, 1 -ŷi,ps + ŷi,ns ) (8)

4. EVALUATION

To verify the superiority and effectiveness of the proposed LightGCL method, we perform extensive experiments to answer the following research questions: • RQ1: How does LightGCL perform on different datasets compared to various SOTA baselines? • RQ2: How does the lightweight graph contrastive learning improve the model efficiency? • RQ3: How does our model perform against data sparsity, popularity bias and over-smoothing? In accordance with He et al. (2020) and Wu et al. (2021) , we split the datasets into training, validation and testing sets with a ratio of 7:2:1. We adopt the Recall@N and Normalized Discounted Cumulative Gain (NDCG)@N, where N = {20, 40}, as the evaluation metrics. • Self-Supervised Learning Recommender Systems: GraphCL (You et al., 2020) , GRACE (Zhu et al., 2020) , GCA (Zhu et al., 2021b) , MHCN (Yu et al., 2021) , SAIL (Yu et al., 2022b) , Au-toGCL (Yin et al., 2022) , SimGRACE (Xia et al., 2022a) , SGL (Wu et al., 2021) , HCCF (Xia et al., 2022b) , SHT (Xia et al., 2022c) , SimGCL (Yu et al., 2022a) . Due to space limit, the detailed descriptions of baselines are presented in Appendix A.

4.1.3. HYPERPARAMETER SETTINGS

To ensure a fair comparison, we tune the hyperparameters of all the baselines within the ranges suggested in the original papers, except the following fixed settings for all the models: the embedding size is set as 32; the batch size is 256; two convolutional layers are used for GCN models. For our LightGCL, the regularization weights λ 1 and λ 2 are tuned from {1e-5, 1e-6, 1e-7} and {1e-4, 1e-5}, respectively. The temperature τ is searched from {0.3, 0.5, 1, 3 ,10}. The dropout rate is chosen from {0, 0.25}. The rank (i.e., q) for SVD, is set as 5. We use the Adam optimizer with a learning rate of 0.001 decaying at the rate of 0.98 until the rate reaches 0.0005. *

4.2. PERFORMANCE VALIDATION (RQ1)

We summarize the experimental result in • Contrastive Learning Dominates. As can be seen from the table, recent methods implementing contrastive learning (SGL, HCCF, SimGCL) exhibit consistent superiority as compared to traditional graph-based (GCCF, LightGCN) or hypergraph-based (HyRec) models. They also perform better than some of other self-supervised learning approaches (MHCN). This could be attributed to the effectiveness of CL to learn evenly distributed embeddings (Yu et al., 2022a ). • Contrastive Learning Enhancement. Our method consistently outperforms all the contrastive learning baselines. We attribute such performance improvement to the effective augmentation of graph contrastive learning via injecting global collaborative contextual signals. Other compared contrastive learning-based recommenders (e.g., SGL, SimGCL, and HCCF) are easily biased by noisy interaction information and generate misleading self-supervised signals.

4.3. EFFICIENCY STUDY (RQ2)

GCL models often suffer from a high computational cost due to the construction of extra views and the convolution operations performed on them during training. However, the low-rank nature of the SVD-reconstructed graph and the simplified CL structure enable the training of our LightGCL to be highly efficient. We analyze the pre-processing and per-batch training complexity of our model in comparison to three competitive baselines, as summarized in Table 2 . ‡  O(E) O(E) O(E) O(E) SVD - - - O(qE) Training Augmentation - O(2ρE) - - Graph Convolution O(2ELd) O(2ELd + 4ρELd) O(6ELd) O[2ELd + 2q(I + J)Ld] BPR Loss O(2Bd) O(2Bd) O(2Bd) O(2Bd) InfoNCE Loss - O(Bd + BM d) O(Bd + BM d) O[(Bd + BM d)L] • Although our model requires performing the SVD in the pre-processing stage which takes O(qE), the computational cost is negligible compared to the training stage since it only needs to be performed once. In fact, by moving the construction of contrastive view to the pre-processing stage, we avoid the repetitive graph augmentation during training, which improves model efficiency. To evaluate the robustness of our model in alleviating data sparsity, we group the sparse users by their interaction degrees and calculate the Recall@20 of each group on Yelp and Gowalla datasets. As can be seen from the figures, the performance of HCCF and SimGCL varies across datasets, but our LightGCL consistently outperforms them in all cases. In particular, our model performs notably well on the extremely sparse user group (< 15 interactions), as the Recall@20 of these users is not much lower (and is even higher on Gowalla) than that of the whole dataset. Additionally, we illustrate our model's ability to mitigate popularity bias compared to HCCF and SimGCL. Similar to Section 4.4, we group the long-tail items by their degree of interactions. Following Wu et al. (2021) , we adopt the decomposed Recall@20 defined as Recall (g) = |(V u rec ) (g) ∩V u test | |V u test | where V u test refers to the set of test items for the user u, and (V u rec ) (g) is the set of Top-K recommended items for u that belong to group g. The results are shown in Fig. 3 . Similar to the results on sparse users, HCCF and SimGCL's performance fluctuates a lot with the influence of popularity bias. Our model performs better in most cases, which shows its resistance against popularity bias. Note that since the extremely sparse group (< 15 interactions) is significantly larger than the other groups in Gowalla, they contribute to a large fraction of the Recall@20, resulting in a different trend from that of Yelp in the figure.

4.5. BALANCING BETWEEN OVER-SMOOTHING AND OVER-UNIFORMITY (RQ3)

In this section, we illustrate the effectiveness of our model in learning a moderately dispersed embedding distribution, by preserving user unique preference pattern and inter-user collaborative dependencies. We randomly sample 2,000 nodes from Yelp and Gowalla and map their embeddings to the 2-D space with t-SNE (Van der Maaten & Hinton, 2008) . The visualizations of these embeddings are presented in Fig. 4 . We also calculate the Mean Average Distance (MAD) (Chen et al., 2020a) of the embeddings, summarized in Table 3 . As can be seen from Fig. 4 , the embedding distributions of non-CL methods (i.e., LightGCN, MHCN) exhibit indistinguishable clusters in the embedding space, which indicates the limitation of addressing the over-smoothing issue. On the contrary, the existing CL-based methods tend to learn i) over-uniform distributions, e.g., SGL on Yelp learns a huge cloud of evenly-distanced embeddings with no clear community structure to well capture the collaborative relations between users; ii) highly dispersed small clusters with severe over-smoothing issue inside the clusters, e.g., the embeddings of SimGCL on Gowalla appear to be scattered grained clusters inside which embeddings are highly similar. Compared with them, clear community structures could be identified by our method to capture collaborative effects, while the embeddings inside each community are reasonably dispersed to be reflective of user-specific preference. The MAD of our model's learned features is also in between of the two types of baselines as shown in Table 3 .

4.6. ABLATION STUDY (RQ4)

To investigate the effectiveness of our SVD-based graph augmentation scheme, we perform the ablation study to answer the question of whether we could provide guidance to the contrastive learning with a different approach of matrix decomposition. To this end, we implement two variants of our model, replacing the approximated SVD algorithm with other matrix decomposition methods: CL-MF adopts the view generated by a pre-trained MF (Koren et al., 2009) ; CL-SVD++ utilizes the SVD++ (Koren, 2008) which takes implicit user feedback into consideration. As shown in Table 4 , with the information distilled from MF or SVD++, the model is able to achieve satisfactory results, indicating the effectiveness of using matrix decomposition to empower CL and the flexibility of our proposed framework. However, adopting a pre-trained CL component is not only tedious and timeconsuming but also inferior to utilizing the approximate SVD algorithm in terms of performance. 

4.7. HYPERPARAMETER ANALYSIS (RQ5)

In this section, we investigate our model's sensitivity in relation to several key hyperparameters: the regularization weight for InfoNCE loss λ 1 , the temperature τ , and the required rank of SVD q. • The impact of λ 1 . As illustrated in Fig. 6 , for the three datasets Yelp, Gowalla and ML-10M, the model's performance reaches the peak when λ 1 = 10 -7 . It can be noticed that λ 1 with the range of [10 -6 , 10 -8 ] can often lead to performance improvement. Figure 7 : Impact of τ • The impact of τ . Fig. 7 indicates that the model's performance is relatively stable across different selections of τ from 0.1 to 10, while the best configuration of τ value varies by datasets. • The selection of q. q determines the rank of SVD in our model. Experiments have shown that satisfactory results can be achieved with a small q. Specifically, as in Fig. 5 , we observe that q = 5 is sufficient to preserve important structures of the user-item interaction graph.

4.8. CASE STUDY (RQ4)

In this section, we present a case study to intuitively show the effectiveness of our model to identify useful knowledge from noisy user-item interactions and make accurate recommendations accordingly. In Fig. 8 , we can see that the venues visited by user #26 in Yelp mainly fall into two communities: Cleveland (where the user probably lives) and Arizona (where the user may have travelled to). In the reconstructed graph, these venues are assigned a new weight according to their potential importance. Note that item #2583, a car rental agency in Arizona, has been assigned a negative weight, which conforms to our common sense that people generally would not visit multiple car rental agencies in one trip. The SVD-augmented view also provides predictions on invisible links by assigning a large weight § to potential venues of interest, such as #2647 and #658. Note that when exploiting the graph, the augmented view does not overlook the smaller Arizona community, which enables the model to predict items of minor interests that are usually overshadowed by the majority. 

5. CONCLUSION

In this paper, we propose a simple and effective augmentation method to the graph contrastive learning framework for recommendation. Specifically, we explore the key idea of making the singular value decomposition powerful enough to augment user-item interaction graph structures. Our key findings indicate that our graph augmentation scheme exhibits strong ability in resisting data sparsity and popularity bias. Extensive experiments show that our model achieves new state-of-the-art results on several public evaluation datasets. In future work, we plan to explore the potential of incorporating casual analysis into our lightweight graph contrastive learning model to enhance the recommender system with mitigating confounding effects for data augmentation.

B PERFORMANCE COMPARISON WITH BASELINES (CONTINUED)

In this appendix, we show the performance of NCF, GCCF, GraphCL, SAIL, GRACE, and Auto-GCL, which are not shown in Table 1 due to space limit. The results are summarized in Table 5 . As can be seen from the table, our model outperforms these baselines consistently. 

C THEORETICAL ANALYSIS

We conduct theoretical analyses to show that our local-global CL (Eq. 7) is augmented to maximize the similarity between embeddings of potentially related nodes, based on the SVD-based global relation learning. Specifically, for a node v j ∈ U, where U = {u i ′ |A i,i ′ = 0, Âi,i ′ ̸ = 0}, the embeddings are not updated by s(z i,l , g i,l ) in the vanilla InfoNCE loss, as v j is not adjacent to u i . Instead, our local-global contrastive assigns the following gradients to the embeddings of v j : ∂s(z i,l , g i,l )/∂g i,l-1 = ∂s   z i,l , σ( j∈U α i,j g j,l-1 + A i,j ′ ̸ =0 α i,j ′ g j ′ ,l-1 )   /∂g j,l-1 = z i,l ∥z i,l ∥∥g i,l ∥ • σ ′ (•) • α i,j where α i,j denotes the normalization weight for node u i and v j . In this way, the embeddings of nodes in U are also pulled close to s i,l , which injects relatedness information learned by the SVD into the local-global CL optimization.

D CALCULATION OF COMPLEXITY D.1 ADJACENCY MATRIX NORMALIZATION

For a sparse user-item matrix stored in the Coordinate Format (COO), it requires visiting every nonzero elements in the matrix to perform normalization. Thus, the computational complexity is in the order of the number of edges O(E). Note that for the baseline SGL, it requires normalizing the two augmented graph structures during the training phase, each of which contains ρE edges, so it induces a complexity of O(2ρE) per batch.

D.2 APPROXIMATE SVD ALGORITHM

We refer the readers to Halko et al. (2011) in which the complexity of the approximate SVD algorithm is explained in detail.

D.3 GRAPH CONVOLUTION

Given a sparse COO matrix A with E edges and a dense matrix E with dimensions I(J) × d, it takes O(Ed) time to calculate AE. To perform graph convolution on a graph, we need to multiply the sparse adjacency matrix with E (v) l-1 ∈ R J×d and its transpose with E (u) l-1 ∈ R I×d , which takes O(Ed) each, and O(2Ed) in total. For L layers, O(2ELd) is required. For traditional CL-based methods such as SGL and SimCGL, a three-view structure is adopted, resulting in a complexity of O(12ELd) (for SGL it again varies a bit depending on ρ). For the SVD-view of our model, V ⊤ q E (v) l-1 takes O(qJd), and multiplying the result with the precalculated ( Ûq Ŝq ) takes O(qId); Û ⊤ q E (v) l-1 takes O(qId), and multiplying the result with the precalculated ( Vq Ŝq ) takes O(qJd). So in total it takes O(2q(I + J)d).

D.4 BPR LOSS

In each batch with B users, calculating the scores for positive and negative items both take O(Bd), so in total it takes O(2Bd).

D.5 CL LOSS

In each batch with B users, calculating the numerator of InfoNCE loss takes O(Bd), and calculating the denominator takes O(BM d) where M denotes the total number of nodes in the batch. Since our model adopts a per layer InfoNCE loss, a factor of L is appended.



† Due to space limit, results of NCF, GCCF, GraphCL, SAIL, GRACE, and AutoGCL are in Appendix B. ‡ In the table, E, L and d denotes the edge number, the layer number and embedding size; ρ ∈ (0, 1] is the edge keep rate; q is the required rank; I and J represents the number of users and items; B and M are the batch size and node number in a batch. Detailed calculations are shown in Appendix D § Due to the fully connected nature of the SVD-reconstructed graph, the weights of unobserved interactions in the graph are of smaller magnitude. A weight of 0.01 is already a large weight in the graph.



Figure 1: Overall structure of LightGCL.

model against 16 state-of-the-art baselines with different learning paradigms: • MLP-enhanced Collaborative Filtering: NCF (He et al., 2017). • GNN-based Collaborative Filtering: GCCF (Chen et al., 2020c), LightGCN (He et al., 2020). • Disentangled Graph Collaborative Filtering: DGCF (Wang et al., 2020b). • Hypergraph-based Collaborative Filtering: HyRec (Wang et al., 2020a).

Figure 2: Performance on users of different sparsity degrees, in terms of Recall (histograms) and relative Recall w.r.t overall performances (charts).

Figure 3: LightGCL's ability to alleviate popularity bias in comparison to SOTA CLbased methods HCCF and SimGCL.

Figure 4: Embedding distributions on Yelp and Gowalla visualized with t-SNE.

Figure 5: Recall change w.r.t. q.

Figure 6: Impact of λ 1 .Figure7: Impact of τ

Figure 8: Case study on user #26 in Yelp dataset.

We evaluate our model and the baselines on five real-world datasets: Yelp (29,601 users, 24,734 items, 1,517,326 interactions): a dataset collected from the rating interactions on Yelp platform; Gowalla (50,821 users, 57,440 items, 1,172,425 interactions): a dataset containing users' check-in records collected from Gowalla platform; ML-10M (69,878 users, 10,195 items, 9,988,816 interactions): a well-known movie-rating dataset for collaborative filtering; Amazon-book (78,578 users, 77,801 items, 2,240,156 interactions): a dataset composed of users' ratings on books collected fromAmazon; and Tmall (47,939 users, 41,390 items, 2,357,450 interactions): a E-commerce dataset containing users' purchase records on different products in Tmall platform.

, with the following observations and conclusions:

Performance comparison with baselines on five datasets. Data Metric DGCF HyRec LightGCN MHCN SGL SimGRACE GCA HCCF SHT SimGCL LightGCL p-val. impr.

Comparisons of computational complexity against baselines.

• Traditional GCN methods (e.g., LightGCN)  only perform convolution on one graph, inducing a complexity of O(2ELd) per batch. For most GCL-based methods, three contrastive views are computed per batch, leading to a complexity of roughly three times of LightGCN. In our model, instead, only two contrastive views are involved. Additionally, due to the low-rank property of SVD-based graph structure learning, our graph encoder takes only O[2q(I + J)Ld] time. For most datasets, including the five we use, 2q(I + J) < E. Therefore, the training complexity of our model is less than half of that of the SOTA efficient model SimGCL. 4.4 RESISTANCE AGAINST DATA SPARSITY AND POPULARITY BIAS (RQ3)

Mean Average Distance (MAD) of the embeddings learned by different methods.

Ablation study on LightGCL.

Performance comparison with baselines on five datasets (continued).

annex

A DETAILS OF THE BASELINES MLP-enhanced Collaborative Filtering:• NCF (He et al., 2017) is a collaborative filtering model that leverages neural network to exploit non-linearity. Two hidden layers are used in our evaluation.GNN-based Collaborative Filtering:• GCCF (Chen et al., 2020c) strengthens the GNN-based collaborative filtering by implementing a residual network and reducing the non-linear transformation.• LightGCN (He et al., 2020) adopts a simplified GCN structure without embedding weight matrices and non-linear projection.Disentangled Graph Collaborative Filtering:• DGCF (Wang et al., 2020b ) learns a more sophisticated representation by segmenting the embedding vectors to represent multiple latent intentions.Hypergraph-based Collaborative Filtering:• HyRec (Wang et al., 2020a ) makes use of hypergraph to encode multi-order information between users and items.Self-Supervised Learning Recommender Systems:• GraphCL (You et al., 2020) utilizes random node dropping and edge masking to generate two contrastive views, which were aligned by optimizing the SSL loss function.• GRACE (Zhu et al., 2020) proposes to corrupt the graph structure by both random edge dropout and random node feature dropping, and uses the corrupted graphs as the contrastive views.• GCA (Zhu et al., 2021b) adaptively dropout the nodes and edges by their importance calculated with node centrality.• MHCN (Yu et al., 2021) creates self-supervised signals for the graph representation learning by graph infomax network.• SAIL (Yu et al., 2022b) maximizes the neighborhood predicting probability between GNNgenerated high-level features and input node features.• AutoGCL (Yin et al., 2022) uses GNN to learn to mask nodes and edges in the augmented graph. It minimizes the similarity between the augmented and the original graph, while maximizing the similarity of the embeddings generated through them, so as to uncover the most important information in the graph.• SimGRACE (Xia et al., 2022a) creates augmented view by randomly perturbing the parameters of the GNN network.• SGL (Wu et al., 2021) adopts random walk sampling and probabilistic edge/node dropout to create augmented views for contrastive learning. In our experiments, we adopt the SGL-ED variant, which implements random edge dropout and exhibits the strongest performance according to the original paper.• HCCF (Xia et al., 2022b) encodes global graph information with hypergraph and contrasts it against the local information encoded with GCN. In our experiments, the number of hyper-edges are set as 128 following the original paper.• SHT (Xia et al., 2022c) adopts a hypergraph transformer framework to exploit global collaborative relationships and distills the global information to generate the cross-view self-supervised signals.In our experiments, the number of hyper-edges are set as 128 following the original paper.• SimGCL (Yu et al., 2022a) propose to simplify the graph augmentation process of contrastive learning by directly injecting random noises into the feature representation.

