REAL-TIME AUTOML

Abstract

We present a new zero-shot approach to automated machine learning (AutoML) that predicts a high-quality model for a supervised learning task and dataset in real-time without fitting a single model. In contrast, most AutoML systems require tens or hundreds of model evaluations. Hence our approach accelerates AutoML by orders of magnitude. Our method uses a transformer-based language embedding to represent datasets and algorithms using their free-text descriptions and a meta-feature extractor to represent the data. We train a graph neural network in which each node represents a dataset to predict the best machine learning pipeline for a new test dataset. The graph neural network generalizes to new datasets and new sets of datasets. Our approach leverages the progress of unsupervised representation learning in natural language processing to provide a significant boost to AutoML. Performance is competitive with state-of-the-art AutoML systems while reducing running time from minutes to seconds and prediction time from minutes to milliseconds, providing AutoML in real-time.

1. INTRODUCTION

A data scientist facing a challenging new supervised learning task does not generally invent a new algorithm. Instead, they consider what they know about the dataset and which algorithms have worked well for similar datasets in past experience. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. A major challenge is to develop fast, efficient algorithms to accelerate applications of machine learning (Kokiopoulou et al., 2019) . This work develops automated solutions that exploit human expertise to learn which datasets are similar and what algorithms perform best. We use a transformer-based language model (Devlin et al., 2018) allowing our AutoML system to process text descriptions of datasets and algorithms, and a feature extractor (BYU-DML, 2019) to represent the data itself. Using such models for our representation brings in large-scale data. We allow to train our model on other existing AutoML system solutions, specifically AutoSklearn (Feurer et al., 2015) , AlphaD3M (Drori et al., 2018) , OBOE (Yang et al., 2019) , and TPOT (Olson & Moore, 2019) , tapping into their diverse set of solutions. Our approach fuses these representations (dataset description, data, AutoML pipeline descriptions) and represents datasets as nodes in a graph of datasets. Generally, graph neural networks are used for three main tasks: (i) node prediction, (ii) link prediction, and (iii) sub-graph or entire graph classification. In this work we use a GNN for node prediction, which predicts the machine learning pipeline for an unseen dataset. Specifically, we use a graph attention network (GAT) (Veličković et al., 2018) with neighborhood aggregation, in which an attention function adaptively controls the contribution of neighbors. An advantage of using a GNN for AutoML is boosting AutoML performance by sharing information between datasets (graph nodes): including description and algorithm, by message passing between the nodes in the graph. In addition, GNNs generalize well to a new unknown dataset using the aggregated weights learnt over the training datasets. GNN weights are shared with the test dataset for prediction. GNNs generalize to entire new sets of datasets. Finally, prediction is in real-time, within milliseconds. A simple idea is to use machine learning pipelines that performed well (for the same task) on similar datasets. What constitutes a similar dataset? The success of an AutoML system often hinges on this question, and different frameworks have different answers: for example, AutoSklearn (Feurer et al., 2015) computes a set of meta-features, which are features describing the data features, for each dataset, while OBOE (Yang et al., 2019) uses the performance of a few fast, informative models to compute latent features. More generally, for any supervised learning task, one can view the list of recommended algorithms generated by any AutoML system as a vector describing that task. This work is the first to use the information that a human would check first: a summary description of the dataset and algorithms, written in free text. These dataset features induce a metric structure on the space of datasets. Under an ideal metric, a model that performs well on one dataset would also perform well on nearby datasets. The methods we develop in this work show how to learn such a metric using the recommendations of an AutoML framework together with the dataset description. We provide a new zero-shot AutoML method that predicts accurate machine learning pipelines for an unseen dataset and classification task in real-time and runs the pipeline in a few seconds. We use a transformer-based language model to embed the description of the dataset and pipelines and a feature extractor to compute meta-features from the data. Based on the description embedding and meta-features, we build a graph as the input to a graph neural network (GNN). Each dataset represents a node in the graph, together with its corresponding feature vector. The GNN is trained to predict a machine learning pipeline for a new node (dataset). Therefore, given a new dataset, our real-time AutoML method predicts a pipeline with good performance within milliseconds. The running time of our predicted pipeline is a few seconds and the accuracy of the predicted pipeline is competitive with state-of-the-art AutoML methods that are given one minute. This work makes several contributions by using language embeddings and GNNs for AutoML for the first time, and leveraging existing AutoML systems. The result is a real-time high-quality AutoML system. Real-time. Our system predicts a machine learning pipeline for a new dataset in milliseconds and then runs the pipeline and tunes its hyper-parameters within three seconds. This reduces computation time by orders of magnitude compared with state-of-the-art AutoML systems, while improving performance. GNN architecture. Our work achieves real-time AutoML by introducing several architectural components that are new to AutoML. These include embeddings for dataset descriptions and algorithm descriptions using a state-of-the-art transformer-based language model in addition to (standard) embeddings for data; a non-Euclidean embedding of datasets as a graph; and a predictive model employing a GNN on the graph of datasets. Importantly, the GNN recommends a pipeline for a new dataset by adding a node to the graph of datasets and sharing the GNN weights with the new node. Using the information and relationships between all datasets boosts AutoML performance. Embeddings. Bringing techniques from NLP to AutoML, specifically using a large-scale transformer-based language model to embed the description of the dataset and algorithms, brings in information from a large corpra of text. This allows our zero-shot AutoML to train on a small set of datasets with state-of-the-art test set performance. Leveraging existing AutoML systems. Our flexible architecture can use pipeline recommendations from any number of other AutoML systems to improve performance.

2. RELATED WORK

AutoML is an emerging field of machine learning with the potential to transform the practice of Data Science by automatically choosing a model to best fit the data. Several comprehensive surveys of the field are available (He et al., 2019; Zöller & Huber, 2019) . Processing each dataset in isolation. The most straightforward approach to AutoML considers each dataset in isolation and asks how to choose the best hyper-parameter settings for a given algorithm. While the most popular method is still grid search, other more efficient approaches include Bayesian optimization (Snoek et al., 2012) or random search (Solis & Wets, 1981) . Recommender systems. These methods learn (often, exhaustively) what algorithms and hyperparameter settings performed best for a training set-of-datasets and use this information to select better algorithms on a test set without exhaustive search. This approach reduces the time required to find a good model. An example is OBOE (Yang et al., 2019; 2020) , which fits a low rank model to learn the low-dimensional representations for the models (or pipelines) and datasets that best predict the cross-validated errors, among all bilinear models. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. A related approach (Fusi et al., 2018) using probabilistic matrix factorization powers Microsoft Azure's AutoML service (Mukunthu, 2019) . Search trees. Auto-Tuned Models (Swearingen et al., 2017) represent the search space as a tree with nodes being algorithms or hyper-parameters and searches for the best branch using a multiarmed bandit. Model-based reinforcement learning. AlphaD3M (Drori et al., 2018; 2019a) formulated Au-toML as a single player game. The system uses reinforcement learning with self-play and a pretrained model which generalizes from many different datasets and similar tasks. Genetic programming. TPOT (Olson & Moore, 2019) and Autostacker (Chen et al., 2018) use genetic programming to choose both hyper-parameter settings and a topology of a machine learning pipeline. TPOT represents pipelines as trees, whereas Autostacker represents them as layers. Bayesian optimization. AutoSklearn (Feurer et al., 2015) chooses a model for a new dataset by first computing (ad hoc) data meta-features to find nearest-neighbor datasets. The best-performing methods on the neighbors are refined via Bayesian optimization and used to form an ensemble. Differentiable programming. End-to-end learning of machine learning pipelines is performed using differentiable primitives (Milutinovic et al., 2017) forming a directed acyclic graph. Algorithmic primitives. One major factor in the performance of an AutoML system is the base set of algorithms it can use to compose more complex pipelines. For a fair comparison, in our numerical experiments we compare our proposed methods only to other AutoML systems that use Scikit-learn (Pedregosa et al., 2011) primitives. Embeddings. Language has a common unstructured representation as a sequence of words, sentences, or paragraphs. The most significant recent progress in NLP is large-scale transformer-based models and embeddings (Devlin et al., 2018; Shoeybi et al., 2019; Raffel et al., 2019) based on attention mechanisms (Vaswani et al., 2017 ). An unsupervised corpus of text is transformed into a supervised dataset by defining content-target pairs along the entire text: for example, target words that appear in each sentence, or target sentences which appear in each paragraph. A language model is first trained to learn a low dimensional embedding of words or sentences followed by a map from low dimensional content to target (Mikolov et al., 2013) . This embedding is then used on a new, unseen and small dataset in the same low-dimensional space. Our work uses such embeddings for automatic machine learning. In a similar fashion to recent work (Drori et al., 2019b) we use an embedding for the dataset and algorithm descriptions. In this work we model the non-linear interactions between these embedding using a neural network as well.

3. METHODS

Our zero-shot AutoML predicts a machine learning pipeline for a classification task on a dataset based on the dataset description and data, and based on other datasets, their relationships, and their recommended pipelines by AutoML systems. We embed the dataset description and extract data meta-features to construct a graph of datasets where each node represents a dataset. The graph is processed using a graph neural network (GNN). Each node of the graph contains a feature vector which is the fusion of the description embedding and data meta-features, and the GNN node representations includes other AutoML solutions. The machine learning pipeline for a new dataset is predicted by the GNN. A detailed architecture is illustrated in Figure 1 and described by Algorithms 2 and 3. The notation used in this work are given in Table 1 .

3.1. PRE-PROCESSING

Our pre-processing consists of (i) dataset description embedding; (ii) dataset meta-feature extraction; and (iii) pipeline computation and description embeddings, as described next and summarized in Algorithm 1. Predicted pipeline on dataset D R(P, D) Performance of running pipeline P on dataset D F D Data meta-features F M = E(M(D)) Embedding of dataset description F D,M = [F D , F M ] Concatenation F P = E(M(P)) Embedding of pipeline description G Datasets graph i ∈ V Node in G j ∈ N (i) Neighbors j of node i Fi = f φ (F D i ,M i ) Fusion network output on graph node vi = [Fi, F P (D i ) ] Features of node in G ui = g θ (vi) Fusion network, features of node in GNN {uj } j∈N (i) Features of node neighbors in GNN h W,z (ui, {uj } j∈N (i) ) GNN with parameters W, z Table 1 : Zero-shot AutoML notation and description. Dataset Description Embedding. We create a feature vector by embeding the description M(D) of each dataset as a 1024-dimensional vector F M = E(M(D)) ∈ R 1024 using BERT (Devlin et al., 2018) . The supplementary material shows examples of dataset descriptions embedded using our approach. Data meta-features. We compute meta-features F D ∈ R 148 for the dataset D using a feature extractor (BYU-DML, 2019), restricting to meta-features that can be computed in one second on any of the datasets used in our experiments. Meta-features include statistics of the datasets and results of simple algorithms. Pipelines and pipeline embedding. For each dataset, we compute the recommended pipeline returned by AutoML systems OBOE (O), AutoSklearn (S), AlphaD3M (A), and TPOT (T). We create feature vectors for recommended pipelines by embedding the Scikit-learn documentations for pre-processor or feature selector and estimator (which is unique within each pipeline). Again, we use the BERT embedding, to form a 1024-dimensional embedding E(M(P C (D))) ∈ R 1024 for each pipeline, where C ranges over the AutoML methods O, S, A, and T. The best-performing pipeline P returned by any AutoML system serves as our training label: we train our system to recommend this pipeline. Fused dataset representations. The combined representation of dataset D i with description M(D i ) fuses together the dataset description embedding and data meta-features using a neural network (whose weights are learned): F i = f φ ([F Di , F Mi ]) ∈ R 512 . (1) We also represent the dataset and its best pipeline by fusing this representation with the pipeline embedding using a second neural network: u i = g θ ([F i , F P (Di) ]) ∈ R 512 (2) Experimentally, these fused representations improve performance compared to concatenation. Algorithm 1 Zero-shot AutoML pre-processing Input: training datasets {(Di, Mi)}i∈V . Output: features {FM i , FD i , F P (D i ) }i∈V . for i = 1 to n do compute embedding of description FM i = E(Mi) compute data meta-features FD i for all C ∈ O, S, A, T do compute recommended pipeline PC(Di) compute performance on dataset R(PC, Di) end for select best performing pipeline P (Di) embed pipeline F P (D i ) = E(M(P (Di))) end for

3.2. GRAPH REPRESENTATION

We build a graph G = (V, E) where each node i ∈ V represents the dataset D i and has feature vector v i . Nodes. The feature vector v i = [F i , F PC(Di) ] ∈ R 1536 for node i representing dataset D i with description M(D i ) concatenates the fused dataset representation (described above) F i ∈ R 512 and the pipeline embedding F P = E(P ) ∈ R 1024 for the pipeline P that performed best on the dataset. During training, we mask the pipeline embedding from the feature vector and learn to predict a node using the GNN. Edges. To compute the edges of the graph G, we compute the distance d between each pair of datasets i, j as d = F i -F j 2 where F i and F j are the fused dataset representations (described above) for the datasets. Two datasets are connected by an edge if dataset j is one of the k nearest neighbors of dataset i or vice versa. In our experiments, we chose k = 20: we found that our method is reasonably robust to the choice of k; that Euclidean distance outperforms cosine similarity; and that a k-NN graph outperforms a threshold-based graph. At training time, we build this graph on the training datasets. At test time, given a new test dataset, we dynamically connect the new node to the graph using its fused feature representation f φ ([F Mtest , F Dtest ]) to choose edges. Notice that the edges for the new dataset are chosen quickly, without fitting a single machine learning model.

3.3. NEURAL NETWORK ARCHITECTURE

The neural networks we train for zero-shot AutoML consist of two fusion networks and a graph attention network (a type of GNN). The fusion networks are used to capture the non-linear inter-actions between the features corresponding to the dataset description, the data meta-features, and the pipeline embedding. The GNN predicts the best pipeline for a new dataset based the weights optimized during training as described next. Graph Attention Network. A graph attention network (GAT) (Veličković et al., 2018) is used to predict the best pipeline for a new dataset. Each layer l = 1, ..., L of the GNN updates the feature vector at the i-th node as: u l i = α ii W u l-1 i + j∈N (i) α ij W u l-1 j , where W is a learnable weight matrix, N (i) are the neighbors of the i-th node, and α ij are the attention coefficients, defined as: α ij = exp(σ(z [W u i , W u j ])) k∈N (i) exp(σ(z [W u i , W u k ])) , ( ) where z is a learnable vector, and σ(•) is the leaky ReLU activation function. Our GNN consists of 3 GAT layers. The last layer of our GNN is a softmax which computes a vector of probabilities over pipelines. Hence the output of the GAT is a probability distribution over pipelines for each node. The network recommends the pipeline that maximizes this probability. Alternatively, we may sample from this probability distribution to obtain several pipelines that can be combined into an ensemble.

3.4. TRAINING AND TESTING

Training. Our training process is illustrated in Figures 1 and 2 , and described in Algorithm 2. At each training iteration, we randomly select a node i. We mask the pipeline embedding of the i-th node as u i = g θ ([F i , 0]). The true label is defined as the pipeline with best performance among the four AutoML systems P (D i ) on the i-th dataset. The resulting problem is a multi-class classification problem with as many classes as there are distinct algorithms. The loss function is defined by cross-entropy between the probability p of predicted algorithm P(D i ) and one-hot encoding y of the best algorithm P (D i ): L( P(D i ), P (D i )) = - m l=1 y l log(p l ). Figure 2 : Illustration of zero-shot AutoML dataset graph construction and prediction. for i = 1 to n do compute fused representation Fi = f φ (FD i ,M i ) end for compute pairwise distances d(Fi, Fj)i,j∈V for i = 1 to n do connect node i to k-NN nodes N (i) end for select random node i in G compute ui = g θ (Fi, 0) for all j = i do compute uj = g θ (Fj, F P * (D j ) ) end for predict best pipeline P(Di) = hW,z(ui, {uj} j∈N (i) ) compute loss L( P(Di), P (Di)) update weights end for Testing. Our testing process is illustrated on the right path of Figure 1 and Algorithm 3. Given a new dataset D and description M, we compute the description embedding F M and data metafeatures F D and the fused dataset representation F. We use this representation to compute the edges of this new node in the graph of all datasets. Next, we add the new node, with features u = g θ ([F, 0]), to the current graph, replacing the embedding of the pipeline with the zero vector. Finally, we use the graph neural network to recommend a pipeline for the test dataset.

Algorithm 3 Zero-shot AutoML testing

Input: dataset Di, description M(Di), datasets graph G, GNN, s.t. i ∈ V (disjoint train and test). Output: predict best pipeline P(Di) for task on dataset. generate new node i in G: compute embedding of description FM = E(M(Di)) compute data meta-features FD compute fused representation F = f φ (FD, FM) connect node i to k-NN nodes j ∈ N (i), V = V ∪ {i}. compute ui = g θ (F, 0) predict best pipeline P(Di) = hW,z(ui, {uj} j∈N (i) ) Notice our method does not need to complete even a single model fit to recommend a model with hyper-parameters. On the other hand, we must fit the model (to learn the parameters) on the dataset to predict output values for new input data. Our method can always recommend a model in 3 seconds, but training is still needed for prediction.

4. RESULTS

Table 2 shows our results for a representative set of test datasets, comparing our approach with state-of-the-art AutoML systems and baselines. For each dataset (row), Table 2 reports the mean evaluation accuracy of different AutoML methods. Figure 3 compares the of accuracy on the test set between our zero-shot approach given 3 seconds of computation, and other AutoML systems and random forest baseline given 1 minute of computation. Our new zero-shot AutoML approach is the only AutoML system that provides predictions within 3 seconds. OBOE requires at least 20 seconds to perform predictions, and then only on a few of the datasets. AlphaD3M reaches performance slightly better than our approach, however given a minute of computation. See the supplementary Under review as a conference paper at ICLR 2021 Table 2 : Comparison of testing performance and time between AutoML systems and baselines: our zero-shot approach given 3 seconds and AutoSklearn, OBOE, TPOT, and AlphaD3M given 1 minute. Testing time for predicting the machine learning algorithm is milliseconds. Testing time for running the predicted machine learning algorithm and computing performance is 3 seconds. Our new zero-shot AutoML approach is the only AutoML system that provides predictions within 3 seconds. material for additional results which validate the performance of our method. First, our zero-shot method generally outperforms other simple baselines given the same amount of computation time. Second, our zero-shot method, in 3 seconds, gives results comparable to state-of-the-art AutoML systems given 1 minute. 

5. CONCLUSIONS AND FUTURE WORK

We introduce a new zero-shot approach to AutoML that is able to recommend a good pipeline to use for a given dataset in real-time. Our system builds a graph from both NLP text embedding of the dataset and pipeline descriptions as well as data meta-features and uses a graph neural network to predict the best pipeline for a given dataset. Our approach matches the performance of other state-of-the-art AutoML systems and is significantly faster, reducing running time from minutes to seconds and prediction time from minutes to milliseconds. Future work will extend our approach to handle different types of data, including audio and images. In addition, we envision an extension to semi-supervised AutoML by using a GNN to embed a large unsupervised set of datasets without pipelines, such as the 25 Million datasets available on Google dataset search (Brickley et al., 2019) , together with a small supervised set of datasets with AutoML pipelines. Finally, we will make our data, models, and code public upon publication.



Figure 1: Zero-shot AutoML architecture: Dataset descriptions are embedded using a language model. The data itself is passed through a feature extractor. Other AutoML system algorithms are embedded using a language model. Fully connected neural networks fuse together the encoded feature vectors. A graph captures the relationships between the embedded representations. At training time a GNN learns the aggregation of each node in the graph and its neighbors. The GNN predicts a pipeline for a new node (dataset). At test time a dataset is added as a new node in the graph and the GNN predicts the best machine learning pipeline without running any AutoML system or evaluating any pipeline. Inputs are colored green, neural networks in blue, intermediate outputs in red, and predicted output in yellow.

Figure3: Comparison of accuracy on test set between our zero-shot approach given 3 seconds of computation and other AutoML systems and Random Forest baseline given 1 minute of computation. Our zero-shot approach matches the performance of baselines while running 20 times faster.

Algorithm 2 Zero-shot AutoML trainingInput: training datasets, descriptions {Di, M(Di)}i∈V . Output: datasets graph G, GNN hW,z, fusion networks f φ and g θ . pre-process: compute {FM i , FD i , F P (D i ) }i∈V initialize fusion networks weights φ, θ.

availability

https://colab.research.google.com/drive/1t0Gt8c_Tp3gYFnLhcQ4SBnkO2vED3qbA#scrollTo=m7vlAj6LRZEU ([<matplotlib.axis.YTick at 0x7fb9b29999b0>, <matplotlib.axis.YTick at 0x7fb9b2999390>, <matplotlib.axis.YTick at 0x7fb9b2997b00>, <matplotlib.axis.YTick at 0x7fb9b2942828>, <matplotlib.axis.YTick at 0x7fb9b294ada0>, <matplotlib.axis.YTick at 0x7fb9b294af60>], [Text(0, 0, '0'), Text(0, 0, '0.2'), Text(0, 0, '0.4'), Text(0, 0, '0.6'), Text(0, 0, '0.8'), Text(0, 0, '1.0')]) fig, axes = plt.subplots() axes.scatter(res_sklearn, res_zero, c='tab:red', marker='o', s=150) t = np.arange(0,1.05,0.01) axes.plot(t,t,'k--') axes.set_xlim(0.0,1.05) axes.set_ylim(0.0,1.05) axes.set_xlabel('AutoSklearn (1 minute)',fontsize=18) axes.set_ylabel('Zero-Shot AutoML (3 seconds)',fontsize=18) plt.axis('square') plt.xticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) plt.yticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) fig, axes = plt.subplots() 7/23/20, 11:23 AM figure_plot.ipynb -Colaboratory Page 4 of 37 https://colab.research.google.com/drive/1t0Gt8c_Tp3gYFnLhcQ4SBnkO2vED3qbA#scrollTo=m7vlAj6LRZEU ([<matplotlib.axis.YTick at 0x7fb9b28f0a58>, <matplotlib.axis.YTick at 0x7fb9b28f0438>, <matplotlib.axis.YTick at 0x7fb9b28f4ba8>, <matplotlib.axis.YTick at 0x7fb9b291ab38>, <matplotlib.axis.YTick at 0x7fb9b28a3a20>, <matplotlib.axis.YTick at 0x7fb9b28a3eb8>], [Text(0, 0, '0'), Text(0, 0, '0.2'), Text(0, 0, '0.4'), Text(0, 0, '0.6'), Text(0, 0, '0.8'), Text(0, 0, '1.0')]) fig, axes = plt.subplots() axes.scatter(res_oboe, res_zero, c='tab:purple', marker='o', s=150) t = np.arange(0,1.05,0.01) axes.plot(t,t,'k--') axes.set_xlim(0.0,1.05) axes.set_ylim(0.0,1.05) axes.set_xlabel('OBOE (1 minute)',fontsize=18) axes.set_ylabel('Zero-Shot AutoML (3 seconds)',fontsize=18) # axes[1].set_ylabel('Ours',fontsize=14) plt.axis('square') plt.xticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) plt.yticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) fig, axes = plt.subplots() 7/23/20, 11:23 AM figure_plot.ipynb -Colaboratory Page 5 of 37 https://colab.research.google.com/drive/1t0Gt8c_Tp3gYFnLhcQ4SBnkO2vED3qbA#scrollTo=m7vlAj6LRZEU ([<matplotlib.axis.YTick at 0x7fb9b28c88d0>, <matplotlib.axis.YTick at 0x7fb9b28c82b0>, <matplotlib.axis.YTick at 0x7fb9b28c7a20>, <matplotlib.axis.YTick at 0x7fb9b2872b00>, <matplotlib.axis.YTick at 0x7fb9b287b9e8>, <matplotlib.axis.YTick at 0x7fb9b287be80>], [Text(0, 0, '0'), Text(0, 0, '0.2'), Text(0, 0, '0.4'), Text(0, 0, '0.6'), Text(0, 0, '0.8'), Text(0, 0, '1.0')]) # axes[2].scatter(res_tpot, res_zero, c='k', marker='o', s=150) fig, axes = plt.subplots() axes.scatter(res_tpot, res_zero, c='tab:orange', marker='o', s=150) t = np.arange(0,1.05,0.01) axes.plot(t,t,'k--') axes.set_xlim(0.0,1.05) axes.set_ylim(0.0,1.05) axes.set_xlabel('TPOT (1 minute)',fontsize=18) axes.set_ylabel('Zero-Shot AutoML (3 seconds)',fontsize=18) plt.axis('square') plt.xticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) plt.yticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) # axes[2].scatter(res_tpot, res_zero, c='k', marker='o', s=150) 7/23/20, 11:23 AM figure_plot.ipynb -Colaboratory Page 6 of 37 https://colab.research.google.com/drive/1t0Gt8c_Tp3gYFnLhcQ4SBnkO2vED3qbA#scrollTo=m7vlAj6LRZEU ([<matplotlib.axis.YTick at 0x7fb9b2822898>, <matplotlib.axis.YTick at 0x7fb9b2822278>, <matplotlib.axis.YTick at 0x7fb9b28999e8>, <matplotlib.axis.YTick at 0x7fb9b2849ac8>, <matplotlib.axis.YTick at 0x7fb9b28539b0>, <matplotlib.axis.YTick at 0x7fb9b2853e48>], [Text(0, 0, '0'), Text(0, 0, '0.2'), Text(0, 0, '0.4'), Text(0, 0, '0.6'), Text(0, 0, '0.8'), Text(0, 0, '1.0')]) fig, axes = plt.subplots() axes.scatter(res_d3m, res_zero, c='tab:blue', marker='o', s=150) t = np.arange(0,1.05,0.01) axes.plot(t,t,'k--') axes.set_xlim(0.0,1.05) axes.set_ylim(0.0,1.05) axes.set_xlabel('AlphaD3M (1 minute)',fontsize=18) axes.set_ylabel('Zero-Shot AutoML (3 seconds)',fontsize=18) plt.axis('square') plt.xticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) plt.yticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2', '0.4', '0.6', '0.8', '1.0']) fig, axes = plt.subplots() 7/23/20, 11:23 AM figure_plot.ipynb -Colaboratory Page 7 of 37 https://colab.research.google.com/drive/1t0Gt8c_Tp3gYFnLhcQ4SBnkO2vED3qbA#scrollTo=m7vlAj6LRZEU ([<matplotlib.axis.YTick at 0x7fb9b5c82198>, <matplotlib.axis.YTick at 0x7fb9b2ab3b38>, <matplotlib.axis.YTick at 0x7fb9b3d56390>, <matplotlib.axis.YTick at 0x7fb9b41d18d0>, <matplotlib.axis.YTick at 0x7fb9b41d1358>, <matplotlib.axis.YTick at 0x7fb9b3821240>], [Text(0, 0, '0'),

