REAL-TIME AUTOML

Abstract

We present a new zero-shot approach to automated machine learning (AutoML) that predicts a high-quality model for a supervised learning task and dataset in real-time without fitting a single model. In contrast, most AutoML systems require tens or hundreds of model evaluations. Hence our approach accelerates AutoML by orders of magnitude. Our method uses a transformer-based language embedding to represent datasets and algorithms using their free-text descriptions and a meta-feature extractor to represent the data. We train a graph neural network in which each node represents a dataset to predict the best machine learning pipeline for a new test dataset. The graph neural network generalizes to new datasets and new sets of datasets. Our approach leverages the progress of unsupervised representation learning in natural language processing to provide a significant boost to AutoML. Performance is competitive with state-of-the-art AutoML systems while reducing running time from minutes to seconds and prediction time from minutes to milliseconds, providing AutoML in real-time.

1. INTRODUCTION

A data scientist facing a challenging new supervised learning task does not generally invent a new algorithm. Instead, they consider what they know about the dataset and which algorithms have worked well for similar datasets in past experience. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. A major challenge is to develop fast, efficient algorithms to accelerate applications of machine learning (Kokiopoulou et al., 2019) . This work develops automated solutions that exploit human expertise to learn which datasets are similar and what algorithms perform best. We use a transformer-based language model (Devlin et al., 2018) allowing our AutoML system to process text descriptions of datasets and algorithms, and a feature extractor (BYU-DML, 2019) to represent the data itself. Using such models for our representation brings in large-scale data. We allow to train our model on other existing AutoML system solutions, specifically AutoSklearn (Feurer et al., 2015) , AlphaD3M (Drori et al., 2018) , OBOE (Yang et al., 2019) , and TPOT (Olson & Moore, 2019), tapping into their diverse set of solutions. Our approach fuses these representations (dataset description, data, AutoML pipeline descriptions) and represents datasets as nodes in a graph of datasets. Generally, graph neural networks are used for three main tasks: (i) node prediction, (ii) link prediction, and (iii) sub-graph or entire graph classification. In this work we use a GNN for node prediction, which predicts the machine learning pipeline for an unseen dataset. Specifically, we use a graph attention network (GAT) (Veličković et al., 2018) with neighborhood aggregation, in which an attention function adaptively controls the contribution of neighbors. An advantage of using a GNN for AutoML is boosting AutoML performance by sharing information between datasets (graph nodes): including description and algorithm, by message passing between the nodes in the graph. In addition, GNNs generalize well to a new unknown dataset using the aggregated weights learnt over the training datasets. GNN weights are shared with the test dataset for prediction. GNNs generalize to entire new sets of datasets. Finally, prediction is in real-time, within milliseconds. A simple idea is to use machine learning pipelines that performed well (for the same task) on similar datasets. What constitutes a similar dataset? The success of an AutoML system often hinges on this question, and different frameworks have different answers: for example, AutoSklearn (Feurer et al., 2015) computes a set of meta-features, which are features describing the data features, for each dataset, while OBOE (Yang et al., 2019) uses the performance of a few fast, informative models to compute latent features. More generally, for any supervised learning task, one can view the list of

