PRIVATE AND EFFICIENT META-LEARNING WITH LOW RANK AND SPARSE DECOMPOSITION

Abstract

Meta-learning is critical for a variety of practical ML systems -like personalized recommendations systems -that are required to generalize to new tasks despite a small number of task-specific training points. Existing meta-learning techniques use two complementary approaches of either learning a low-dimensional representation of points for all tasks, or task-specific fine-tuning of a global model trained using all the tasks. In this work, we propose a novel meta-learning framework that combines both the techniques to enable handling of a large number of data-starved tasks. Our framework models network weights as a sum of low-rank and sparse matrices. This allows us to capture information from multiple domains together in the low-rank part while still allowing task specific personalization using the sparse part. We instantiate and study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-r and a k-column sparse matrix using a small number of linear measurements. We propose an alternating minimization method with hard thresholding -AMHT-LRS-to learn the low-rank and sparse part effectively and efficiently. For the realizable, Gaussian data setting, we show that AMHT-LRS indeed solves the problem efficiently with nearly optimal samples. We extend AMHT-LRS to ensure that it preserves privacy of each individual user in the dataset, while still ensuring strong generalization with nearly optimal number of samples. Finally, on multiple datasets, we demonstrate that the framework allows personalized models to obtain superior performance in the data-scarce regime.

1. INTRODUCTION

Typical real world settings -like multi user/enterprise personalization -have a long tail of tasks with a small amount of training data. Meta-learning addresses the problem by learning a "learner" that extracts key information/representation from a large number of training tasks, and can be applied to new tasks despite limited number of task specific training data points. Most existing meta-learning approaches can be categorized as: 1) Neighborhood Models: these methods learn a global model, which is then "fine-tuned" to specific tasks (Guo et al., 2020; Howard & Ruder, 2018; Zaken et al., 2021) , 2) Representation Learning: these methods learn a low-dimensional representation of points which can be used to train task-specific linear learners (Javed & White, 2019; Raghu et al., 2019; Lee et al., 2019; Bertinetto et al., 2018; Hu et al., 2021) . In particular, task-specific fine-tuning has demonstrated exceptional results across many natural language tasks (Devlin et al., 2018; Liu et al., 2019; Yang et al., 2019; Lan et al., 2019) . However, such fine-tuned models update all parameters, so each fine-tuned total parameter footprint is same as the original model. This implies that fine-tuning large models -like say a standard BERT model (Devlin et al., 2018) with about 110M parameters -for thousands or millions of tasks would be quite challenging even from storage point of view. One potential approach to handle the large number of parameters is to fine-tune only the last layer, but empirical findings suggest that such solutions can be significantly less accurate than fine-tuning the entire model (Chen et al., 2020a; Salman et al., 2020) . Moreover, representation learning based approaches apply strict restrictions like each task's parameters have to be in a low-dimensional subspace which tend to affect performance in general (Sec. 3). In this work, we propose and study the LRS framework that combines both the above mentioned complementary approaches. That is, LRS restricts the model parameters Θ (i) for the i th task as Θ (i) := U • W (i) + B (i) , where first term denotes applying a low-dimensional linear operator on

