SELF-PRETRAINING FOR SMALL DATASETS BY EX-PLOITING PATCH INFORMATION

Abstract

Deep learning tasks with small datasets are often tackled by pretraining models with large datasets on relevent tasks. Although pretraining methods mitigate the problem of overfitting, it can be difficult to find appropriate pretrained models sometimes. In this paper, we proposed a self-pretraininng method by exploiting patch information in the dataset itself without pretraining on other datasets. Our experiments show that the self-pretraining method leads to better performance than training from scratch both in the condition of not using other data.

1. INTRODUCTION

Transfer learning has become the de facto approach of doing deep learning tasks on small datasets. Because of the data-hungry nature of deep learning methods, training from scratch using small datasets usually got overfitting. Although transfer learning using models pretrained on additional large datasets mitigates the problem of overfitting, it is hard to find an appropriate pretrained model like the ImageNet-classification pretrained model which used in detection and segmentation tasks when the appearance of input data or the goal of task in the target domain is special. Research on training with small datasets without using external information has emerged in these years. Barz et al.(Barz & Denzler, 2020) proposed the method of training from scratch on small datasets using the cosine loss, which got substantially better performance than using the cross entropy loss function on fine-grained classification tasks. Zhang et al. (Zhang et al., 2019) introduced a generative adversarial network into the process of training with limited datasets without using external data or prior knowledge. In contrast to doing data augmentation or training using special loss functions on small dataset tasks, we proposed the self-pretraining method which transfers patch information in the dataset itself to the model in a weakly supervised manner. Patches in images can represent image information in some extent. In (Kang et al., 2014) , Kang et.al predicted the image quality score using the average quality score of image patches which are trained by image-level quality labels. Also, BagNet(Brendel & Bethge, 2019) indicated that small image patches which contain the class evidence can do well in the ImageNet classification challenge by aggregating the their score in the image without considering the spatial order. In a case of the fine-grained classification, (Wang et al., 2017) extracted features of image patches by training with external large datasets in a weakly supervised way. Inspired by (Gatys et al., 2016) which pointed out that convolution neural networks get local features in lower layers and global structure features in higher layers, our self-pretraining method pretrains lower layers to higher layers in the model using image pathces with the incremental size step by step.

2. PROPOSED METHOD

In this section, we proposed our self-pretraining method using patch information in the dataset itself. Despite the small number of training images, the number of patches sampled from images can be large enough to meet the data-hungry demand. Our self-pretraining methods get insights from two aspects: First, large amounts of patches which contain parts of the information in the image can training the network using image-level labels in a weakly supervised manner. Although each small patch does

