DECIPHERING AND OPTIMIZING MULTI-TASK LEARNING: A RANDOM MATRIX APPROACH

Abstract

This article provides theoretical insights into the inner workings of multi-task and transfer learning methods, by studying the tractable least-square support vector machine multi-task learning (LS-SVM MTL) method, in the limit of large (p) and numerous (n) data. By a random matrix analysis applied to a Gaussian mixture data model, the performance of MTL LS-SVM is shown to converge, as n, p → ∞, to a deterministic limit involving simple (small-dimensional) statistics of the data. We prove (i) that the standard MTL LS-SVM algorithm is in general strongly biased and may dramatically fail (to the point that individual single-task LS-SVMs may outperform the MTL approach, even for quite resembling tasks): our analysis provides a simple method to correct these biases, and that we reveal (ii) the sufficient statistics at play in the method, which can be efficiently estimated, even for quite small datasets. The latter result is exploited to automatically optimize the hyperparameters without resorting to any cross-validation procedure. Experiments on popular datasets demonstrate that our improved MTL LS-SVM method is computationally-efficient and outperforms sometimes much more elaborate state-of-the-art multi-task and transfer learning techniques.

1. INTRODUCTION

The advent of elaborate learning machines capable to surpass human performances on dedicated tasks has reopened past challenges in machine learning. Transfer learning, and multitask learning (MTL) in general, by which known tasks are used to help a machine learn other related tasks, is one of them. The particularly interesting aspects of multi-task learning lie in the possibility (i) to exploit the resemblance between the datasets associated to each task so the tasks "help each other" and (ii) to train a machine on a specific target dataset comprised of few labelled data by exploiting much larger labelled datasets, however composed of different data. Practical applications are numerous, ranging from the prediction of student test results for a collection of schools (Aitkin & Longford, 1986) , to survival of patients in different clinics, to the value of many possibly related financial indicators (Allenby & Rossi, 1998) , to the preference modelling of individuals in a marketing context, etc. Since MTL seeks to improve the performance of a task with the help of related tasks, a central issue to (i) understand the functioning of MTL, (ii) adequately adapt its hyperparameters and eventually (iii) improve its performances consists in characterizing how MTL relates tasks to one another and in identifying which features are "transferred". The article aims to decipher these fundamental aspects for sufficiently general data models.

