GRAPH SIGNAL SAMPLING FOR INDUCTIVE ONE-BIT MATRIX COMPLETION: A CLOSED-FORM SOLUTION

Abstract

Inductive one-bit matrix completion is motivated by modern applications such as recommender systems, where new users would appear at test stage with the ratings consisting of only ones and no zeros. We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. The key idea is to transform each user's ratings on the items to a function (graph signal) on the vertices of an item-item graph, then learn structural graph properties to recover the function from its values on certain vertices -the problem of graph signal sampling. We propose a class of regularization functionals that takes into account discrete random label noise in the graph vertex domain, then develop the GS-IMC approach which biases the reconstruction towards functions that vary little between adjacent vertices for noise reduction. Theoretical result shows that accurate reconstructions can be achieved under mild conditions. For the online setting, we develop a Bayesian extension, i.e., BGS-IMC which considers continuous random Gaussian noise in the graph Fourier domain and builds upon a predictioncorrection update algorithm to obtain the unbiased and minimum-variance reconstruction. Both GS-IMC and BGS-IMC have closed-form solutions and thus are highly scalable in large data as verified on public benchmarks.

1. INTRODUCTION

In domains such as recommender systems and social networks, only "likes" (i.e., ones) are observed in the system and service providers (e.g, Netflix) are interested in discovering potential "likes" for existing users to stimulate demand. This motivates the problem of 1-bit matrix completion (OBMC), of which the goal is to recover missing values in an n-by-m item-user matrix R ∈ {0, 1} n×m . We note that R i,j = 1 means that item i is rated by user j, but R i,j = 0 is essentially unlabeled or unknown which is a mixture of unobserved positive examples and true negative examples. However, in real world new users, who are not exposed to the model during training, may appear at testing stage. This fact stimulates the development of inductive 1-bit matrix completion, which aims to recover unseen vector y ∈ {0, 1} n from its partial positive entries Ω + ⊆ {j|y j = 1} at test time. Fig. 1 (a) emphasizes the difference between conventional and inductive approaches. More formally, let M ∈ {0, 1} n×(m+1) denote the underlying matrix, where only a subset of positive examples Ψ is randomly sampled from {(i, j)|M i,j = 1, i ≤ n, j ≤ m} such that R i,j = 1 for (i, j) ∈ Ψ and R i,j = 0 otherwise. Consider (m+1)-th column y out of matrix R, we likewise denote its observations s i = 1 for i ∈ Ω + and s i = 0 otherwise. We note that the sampling process here assumes that there exists a random label noise ξ which flips a 1 to 0 with probability ρ, or equivalently s = y + ξ where ξ i = -1 for i ∈ {j|y j = 1} -Ω + , and ξ i = 0 otherwise. (1) Fig. 1 (a) presents an example of s, y, ξ to better understand their relationships. Fundamentally, the reconstruction of true y from corrupted s bears a resemblance with graph signal sampling. Fig. 1(b) shows that the item-user rating matrix R can be used to define a homogeneous Another emerging line of research has focused on learning the mapping from side information (or content features) to latent factors (Jain & Dhillon, 2013; Xu et al., 2013; Ying et al., 2018; Zhong et al., 2019) . However, it has been recently shown (Zhang & Chen, 2020; Ledent et al., 2021; Wu et al., 2021) that in general this family of algorithms would possibly suffer inferior expressiveness when high-quality content is not available. Further, collecting personal data is likely to be unlawful as well as a breach of the data minimization principle in GDPR (Voigt & Von dem Bussche, 2017). Much effort has also been made to leverage the advanced graph neural networks (GNN) for improvements. van den Berg et al. (2017) represent the data matrix R by a bipartite graph then generalize the representations to unseen nodes by summing the embeddings over the neighbors. Zhang & Chen (2020) develop graph neural networks which encode the subgraphs around an edge into latent factors then decode the factors back to the value on the edge. Besides, Wu et al. ( 2021) consider the problem in a downsampled homogeneous graph (i.e., user-user graph in recommender systems) then exploit attention networks to yield inductive representations. The key advantage of our approach is not only the closed form solution which takes a small fraction of training time required for GNNs, but also theory results that guarantee accurate reconstruction and provide guidance for practical applications. We emphasize the challenges when connecting ideas and methods of graph signal sampling with inductive 1-bit matrix completion -1-bit quantization and online learning. Specifically, 1-bit quantization raises challenges for formulating the underlying optimization problems: minimizing squared loss on the observed positive examples Ω + yields a degenerate solution -the vector with all entries equal to one achieves zero loss; minimizing squared loss on the corrupted data s introduces the systematic error due to the random label noise ξ in Eq. (1). To address the issue, we represent the observed data R as a homogeneous graph, then devise a broader class of regularization functionals on graphs to mitigate the impact of discrete random noise ξ. Existing theory for total variation denoising (Sadhanala et al., 2016; 2017) and graph regularization (Belkin et al., 2004; Huang et al., 2011) , which takes into account continuous Gaussian noise, does not sufficiently address recoverability in inductive 1-bit matrix completion (see Sec 3.4). We finally mange to derive a closed-form solution, entitled Graph Sampling for Inductive (1-bit) Matrix Completion GS-IMC which biases the reconstruction towards functions that vary little between adjacent vertices for noise reduction.



Figure 1: (a) Conventional 1-bit matrix completion focuses on recovering missing values in matrix R, while inductive approaches aim to recover new column y from observations s that are observed at testing stage. ξ denotes discrete noise that randomly flips ones to zeros. (b) Our GS-IMC approach, which regards y as a signal residing on nodes of a homogeneous item-item graph, aims to reconstruct true signal y from its observed values (orange colored) on a subset of nodes (gray shadowed).

Despite popularity in areas such as image processing(Shuman et al., 2013; Pang & Cheung, 2017;  Cheung et al., 2018) and matrix completion (Romero et al., 2016; Mao et al., 2018; McNeil et al.,  2021), graph signal sampling appears less studied in the specific inductive one bit matrix completion problem focused in this paper (see Appendix A for detailed related works). Probably most closely related to our approach are MRFCF (Steck, 2019) and SGMC (Chen et al.

funding

* Junchi Yan is the correspondence author who is also with Shanghai AI Laboratory. The work was in part supported by NSFC (62222607), STCSM (22511105100).

