SEMI-SUPERVISED LEARNING VIA CLUSTERING REP-RESENTATION SPACE

Abstract

We proposed a novel loss function that combines supervised learning with clustering in deep neural networks. Taking advantage of the data distribution and the existence of some labeled data, we construct a meaningful latent space. Our loss function consists of three parts, the quality of the clustering result, the margin between clusters, and the classification error of labeled instances. Our proposed model is trained to minimize our loss function, avoiding the need for pre-training or additional networks. This guides our network to classify labeled samples correctly while able to find good clusters simultaneously. We applied our proposed method on MNIST, USPS, ETH-80, and COIL-100; the comparison results confirm our model's outstanding performance over semi-supervised learning.

1. INTRODUCTION

Labeling data is expensive. Thus, it is often hard for us to get enough labeled samples. Thus, semi-supervised learning (Chapelle et al., 2009) becomes a serious issue. People try to get good performance with limited labeled data and a large amount of unlabeled data. When having a limited amount of labeled samples, extracting information from unlabeled data has played an important role for semi-supervised learning. In general, we often applied unlabeled information as auxiliary tools, such as pre-training (Hinton and Salakhutdinov, 2006) or recursive picking confidence data from unlabeled samples with supervised learning (Zhu, 2005) . However, we notice that when we counter the issue such as Two Half-moon, double circle, or other more complex distribution problem, these methods are lack of considering the spatial distribution information provided by unlabeled samples. In this paper, we aimed to guide our model to extract the spatial distribution information from unlabeled data. We proposed a new approach for semi-supervised learning by adding our loss function term for our target embedding latent space. Within our proposed model, the neural network can now learn correctness and spatial distribution information from labeled and unlabeled samples simultaneously. This provides our feed-forward neural network to have more opportunity passing through the sparse margin between clusters, and elevate the performance of the classifier, see more details in Sec. 3. Moreover, it is worth noting that our proposed model does not rely on any additional neural networks, which is suitable for any task and is highly compatible with different semi-supervised learning algorithms. In short, the characteristics of our proposed model are as follows: Intuitive The idea of correctness and spatial distribution came up with the characteristics of supervised and unsupervised learning straightly, which is intuitive. Compatibility Our method does not rely on any additional neural networks but only adding new loss term. Our approach is easy to change into any existing feed-forward neural networks. Extensible We designed our approach by the notion of defining an evaluation for spatial distribution, which can replace by any other methods in future researches.

2. RELATED WORK

In recent years, the neural network plays an essential role in various tasks; more specifically, they are highly applied in the tasks of image classification. Since then, semi-supervised learning (Weston et al., 2012; Lee, 2013) for image classification has become a vital issue. First of all, some works

