LEARNING SPARSE GROUP MODELS THROUGH BOOLEAN RELAXATION

Abstract

We introduce an efficient algorithmic framework for learning sparse group models formulated as the natural convex relaxation of a cardinality-constrained program with Boolean variables. We provide theoretical techniques to characterize the equivalent condition when the relaxation achieves the exact integral optimal solution, as well as a rounding algorithm to produce a feasible integral solution once the optimal relaxation solution is fractional. We demonstrate the power of our equivalent condition by applying it to two ensembles of random problem instances that are challenging and popularly used in literature and prove that our method achieves the exactness with overwhelming probability and the nearly optimal sample complexity. Empirically, we use synthetic datasets to demonstrate that our proposed method significantly outperforms the state-of-the-art group sparse learning models in terms of individual and group support recovery when the number of samples is small. Furthermore, we show the out-performance of our method in cancer drug response prediction.

1. INTRODUCTION

Sparsity is one of the most important concepts in statistical machine learning, which strongly connects to the data & computational efficiency, generalizability, and interpretability of the model. Traditional sparse estimation tasks aim at selecting sparse features at the individual level Tibshirani (1996); Negahban et al. (2012) . However, in many real-world scenarios, structural properties among the individual features are assumed thanks to prior knowledge, and leveraging these structures may improve both model accuracy and learning efficiency Gramfort & Kowalski (2009) ; Kim & Xing (2012) . In this paper, we focus on learning the sparse group models for intersection-closed group sparsity, where groups of variables are either selected or discarded together. The general task of learning the sparse group models has been investigated quite a lot in literature, where most of the prior studies are based on the structured sparsity-inducing norm regularization Friedman et al. ( 2010 2020) show that the Boolean relaxation empirically outperforms the sparse estimation methods using sparse-inducing norms (Lasso Tibshirani (1996) and elastic net Zou & Hastie (2005)), especially when the sample size is small and the feature dimension is



); Huang et al. (2011); Zhao et al. (2009); Simon et al. (2013), which stems from Lasso Tibshirani (1996), the traditional and popular technique for a sparse estimate at the individual feature level. As reviewed in Bach et al. (2012); Jenatton et al. (2011), the structured sparsity-inducing norm is quite general and can encode structural assumptions such as trees Kim & Xing (2012); Liu & Ye (2010), contiguous groups Rapaport et al. (2008), directed-acyclic-graphs Zheng et al. (2018), and general overlapping groups Yuan et al. (2011). Another type of approach for learning the sparse group models is to view the task as a cardinalityconstrained program, where the constraint set encodes the group structures as well as restricts the number of groups of variables being selected. Baldassarre et al. Baldassarre et al. (2013) investigate the projection onto such cardinality-constrained set. However, due to the combinatorial nature of the projection, directly applying the projected gradient descent with the projection Baldassarre et al. (2013) to solve general learning problems with typical loss functions might not have good results Kyrillidis et al. (2015). Recent work Pilanci et al. (2015) studies the Boolean relaxation of the learning problem with cardinality constraints on the individual variables. This work Pilanci et al. (2015) can be viewed as a special case of sparse group models, where each group contains only one variable. Both the original work of Pilanci et al. (2015) and several follow-up papers Bertsimas & Parys (2020); Bertsimas et al. (

