DOES PROGRESS ON IMAGENET TRANSFER TO REAL-WORLD DATASETS?

Abstract

Does progress on ImageNet transfer to real-world datasets? We investigate this question by evaluating ImageNet pre-trained models with varying accuracy (57% -83%) on six practical image classification datasets. In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collected for comparing models. On multiple datasets, models with higher ImageNet accuracy do not consistently yield performance improvements. For certain tasks, interventions such as data augmentation improve performance even when architectures do not. We hope that future benchmarks will include more diverse datasets to encourage a more comprehensive approach to improving learning algorithms.

1. INTRODUCTION

ImageNet is one of the most widely used datasets in machine learning. Initially, the ImageNet competition played a key role in re-popularizing neural networks with the success of AlexNet in 2012. Ten years later, the ImageNet dataset is still one of the main benchmarks for state-of-the-art computer vision models (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; He et al., 2016; Liu et al., 2018; Howard et al., 2019; Touvron et al., 2021; Radford et al., 2021) . As a result of Ima-geNet's prominence, the machine learning community has invested tremendous effort into developing model architectures, training algorithms, and other methodological innovations with the goal of increasing performance on ImageNet. Comparing methods on a common task has important benefits because it ensures controlled experimental conditions and results in rigorous evaluations. But the singular focus on ImageNet also raises the question whether the community is over-optimizing for this specific dataset. As a first approximation, ImageNet has clearly encouraged effective methodological innovation beyond ImageNet itself. For instance, the key finding from the early years of ImageNet was that large convolution neural networks (CNNs) can succeed on contemporary computer vision datasets by leveraging GPUs for training. This paradigm has led to large improvements in other computer vision tasks, and CNNs are now omnipresent in the field. Nevertheless, this clear example of transfer to other tasks early in the ImageNet evolution does not necessarily justify the continued focus ImageNet still receives. For instance, it is possible that early methodological innovations transferred more broadly to other tasks, but later innovations have become less generalizable. The goal of our paper is to investigate this possibility specifically for neural network architecture and their transfer to real-world data not commonly found on the Internet. When discussing the transfer of techniques developed for ImageNet to other datasets, a key question is what other datasets to consider. Currently there is no comprehensive characterization of the many machine learning datasets and transfer between them. Hence we restrict our attention to a limited but well-motivated family of datasets. In particular, we consider classification tasks derived from image data that were specifically collected with the goal of classification in mind. This is in contrast to many standard computer vision datasets -including ImageNet -where the constituent images were originally collected for a different purpose, posted to the web, and later re-purposed for benchmarking computer vision methods. Concretely, we study six datasets ranging from leaf disease classification over melanoma detection to categorizing animals in camera trap images. Since these datasets represent real-world applications, transfer of methods from ImageNet is particularly relevant.

