SCALABLE LEARNING AND MAP INFERENCE FOR NONSYMMETRIC DETERMINANTAL POINT PROCESSES

Abstract

Determinantal point processes (DPPs) have attracted significant attention in machine learning for their ability to model subsets drawn from a large item collection. Recent work shows that nonsymmetric DPP (NDPP) kernels have significant advantages over symmetric kernels in terms of modeling power and predictive performance. However, for an item collection of size M , existing NDPP learning and inference algorithms require memory quadratic in M and runtime cubic (for learning) or quadratic (for inference) in M , making them impractical for many typical subset selection tasks. In this work, we develop a learning algorithm with space and time requirements linear in M by introducing a new NDPP kernel decomposition. We also derive a linear-complexity NDPP maximum a posteriori (MAP) inference algorithm that applies not only to our new kernel but also to that of prior work. Through evaluation on real-world datasets, we show that our algorithms scale significantly better, and can match the predictive performance of prior work.

1. INTRODUCTION

Determinantal point processes (DPPs) have proven useful for numerous machine learning tasks. For example, recent uses include summarization (Sharghi et al., 2018 ), recommender systems (Wilhelm et al., 2018 ), neural network compression (Mariet & Sra, 2016 ), kernel approximation (Li et al., 2016) , multi-modal output generation (Elfeki et al., 2019) , and batch selection, both for stochastic optimization (Zhang et al., 2017) and for active learning (Bıyık et al., 2019) . For subset selection problems where the ground set of items to select from has cardinality M , the typical DPP is parameterized by an M × M kernel matrix. Most prior work has been concerned with symmetric DPPs, where the kernel must equal its transpose. However, recent work has considered the more general class of nonsymmetric DPPs (NDPPs) and shown that these have additional useful modeling power (Brunel, 2018; Gartrell et al., 2019) . In particular, unlike symmetric DPPs, which can only model negative correlations between items, NDPPs allow modeling of positive correlations, where the presence of item i in the selected set increases the probability that some other item j will also be selected. There are many intuitive examples of how positive correlations can be of practical importance. For example, consider a product recommendation task for a retail website, where a camera is found in a user's shopping cart, and the goal is to display several other items that might be purchased. Relative to an empty cart, the presence of the camera probably increases the probability of buying an accessory like a tripod. Although NDPPs can theoretically model such behavior, the existing approach for NDPP learning and inference (Gartrell et al., 2019) is often impractical in terms of both storage and runtime requirements. These algorithms require memory quadratic in M and time quadratic (for inference) or cubic (for learning) in M ; for the not-unusual M of 1 million, this requires storing 8TB-size objects in memory, with runtime millions or billions of times slower than that of a linear-complexity method. In this work, we make the following contributions:

