LEARNING SPARSE GROUP MODELS THROUGH BOOLEAN RELAXATION

Abstract

We introduce an efficient algorithmic framework for learning sparse group models formulated as the natural convex relaxation of a cardinality-constrained program with Boolean variables. We provide theoretical techniques to characterize the equivalent condition when the relaxation achieves the exact integral optimal solution, as well as a rounding algorithm to produce a feasible integral solution once the optimal relaxation solution is fractional. We demonstrate the power of our equivalent condition by applying it to two ensembles of random problem instances that are challenging and popularly used in literature and prove that our method achieves the exactness with overwhelming probability and the nearly optimal sample complexity. Empirically, we use synthetic datasets to demonstrate that our proposed method significantly outperforms the state-of-the-art group sparse learning models in terms of individual and group support recovery when the number of samples is small. Furthermore, we show the out-performance of our method in cancer drug response prediction.

1. INTRODUCTION

Sparsity is one of the most important concepts in statistical machine learning, which strongly connects to the data & computational efficiency, generalizability, and interpretability of the model. Traditional sparse estimation tasks aim at selecting sparse features at the individual level Tibshirani (1996) ; Negahban et al. (2012) . However, in many real-world scenarios, structural properties among the individual features are assumed thanks to prior knowledge, and leveraging these structures may improve both model accuracy and learning efficiency Gramfort & Kowalski (2009) ; Kim & Xing (2012) . In this paper, we focus on learning the sparse group models for intersection-closed group sparsity, where groups of variables are either selected or discarded together. The general task of learning the sparse group models has been investigated quite a lot in literature, where most of the prior studies are based on the structured sparsity-inducing norm regularization Friedman et al. ( 2010 2020) show that the Boolean relaxation empirically outperforms the sparse estimation methods using sparse-inducing norms (Lasso Tibshirani (1996) and elastic net Zou & Hastie (2005)), especially when the sample size is small and the feature dimension is large. However, the results in Pilanci et al. ( 2015) cannot be applied to the sparse group models with arbitrary group structures. To fill the gap, in this paper, we study the sparse group models through a cardinality-constrained program. We first propose the Boolean relaxation for sparse group models. We further establish an analytical and algorithmic framework for our Boolean relaxation which includes a theorem stating the equivalent condition for the relaxation to achieve the exactness (i.e., the optimal integral solution) and a rounding scheme that produces an integral solution when the optimal relaxation solution is fractional. We demonstrate the power of our equivalent condition theorem by applying it to two ensembles of random problem instances that are challenging and popularly used in literature and proving that our Boolean relaxation achieves the exactness with high probability and the nearly optimal sample complexity. Our contributions are threefold: 1) We propose a novel framework that uses constraints to induce intersection-closed group sparsity. Baldassarre et al. Baldassarre et al. (2013) investigate the projection on the group sparsity constraints. But our framework extends to any convex loss function with the group sparsity constraints. 2) We prove our framework is tight and can achieve the exactness with high probability and the nearly optimal sample complexity for two ensembles of random problem instances. This result is inspired by Pilanci et al. ( 2015) but our derivations and proofs are not straightforward extensions (e.g., due to the group structure, we need to analyze more complex feature-group matrices, prove new matrix concentration properties, and carefully choose different regularization parameters). 3) Empirically, we perform extensive experiments to demonstrate that our framework significantly outperforms the state-of-the-art methods when the sample size is small on simulated datasets. Furthermore, we show the out-performance of our framework in cancer drug response prediction.

1.1. RELATED WORKS

Convex programming relaxations and their rounding techniques have been widely used for approximating many combinatorial optimization problems that are computationally intractable (see, e.g., Williamson & Shmoys (2011) ). The specific algorithmic technique in this work is inspired by the Boolean relaxation method introduced in Pilanci et al. (2015) for learning sparsity at the individual feature level. However, the additional group structure in our problem raises new algorithmic challenges, and both our Boolean relaxation formulation and its theoretical analysis (e.g., the equivalent condition for the exactness) are different from their counterparts Pilanci et al. (2015) . As mentioned before, sparse estimation using structured sparsity-inducing norms were thoroughly studied for learning structured sparsity under different structure assumptions motivated by various practical scenarios Friedman et al. ( 2010 (2011) . However, none of these algorithms provides the rigorous theoretical techniques as in this work to verify whether the algorithm has produced the exact optimal solution. Also, as we will show in the experiments section, our proposed method outperforms these algorithms on both synthetic and real-world datasets. 2018) study general equivalent conditions to characterize the tightness of their relaxations while our theoretical results works for specific distributions where their general conditions cannot be easily verified. We use different analytical frameworks and thus the theoretical results cannot be directly compared.



); Huang et al. (2011); Zhao et al. (2009); Simon et al. (2013), which stems from Lasso Tibshirani (1996), the traditional and popular technique for a sparse estimate at the individual feature level. As reviewed in Bach et al. (2012); Jenatton et al. (2011), the structured sparsity-inducing norm is quite general and can encode structural assumptions such as trees Kim & Xing (2012); Liu & Ye (2010), contiguous groups Rapaport et al. (2008), directed-acyclic-graphs Zheng et al. (2018), and general overlapping groups Yuan et al. (2011). Another type of approach for learning the sparse group models is to view the task as a cardinalityconstrained program, where the constraint set encodes the group structures as well as restricts the number of groups of variables being selected. Baldassarre et al. Baldassarre et al. (2013) investigate the projection onto such cardinality-constrained set. However, due to the combinatorial nature of the projection, directly applying the projected gradient descent with the projection Baldassarre et al. (2013) to solve general learning problems with typical loss functions might not have good results Kyrillidis et al. (2015). Recent work Pilanci et al. (2015) studies the Boolean relaxation of the learning problem with cardinality constraints on the individual variables. This work Pilanci et al. (2015) can be viewed as a special case of sparse group models, where each group contains only one variable. Both the original work of Pilanci et al. (2015) and several follow-up papers Bertsimas & Parys (2020); Bertsimas et al. (

); Huang et al. (2011); Zhao et al. (2009); Simon et al. (2013); Tibshirani (1996); Bach et al. (2012); Kim & Xing (2012); Liu & Ye (2010); Rapaport et al. (2008); Zheng et al. (2018); Yuan et al. (2011); Jenatton et al.

In our experiments, we also compare with the elastic net methodZou & Hastie (2005), which can only control the sparsity at the individual feature level. There exist another family of structured sparsity-inducing norms Jacob et al. (2009) that aim to model the union-closed families of supports, where the support of the solution is a union of groups. Different from our proposed models, in which the support of the solution is the intersection of the complements of some of groups considered (intersection-closed group sparsity) Jenatton et al.(2009). Another approach is to learn sparse group models by introducing the penalty functions for the constraints and applying the convex relaxation to them. Bach Bach (2010) investigate to design norms from submodular set-functions.Halabi et al. El Halabi & Cevher (2015);Halabi et al. (2018)  study to induce group sparsity using tight convex relaxation of linear matrix inequalities and combinatorial penalties. Note that these works use convex regularizers to induce group sparsity while we use constraints.Halabi et al. El Halabi & Cevher (2015); Halabi et al. (

