CHANGE DETECTION FOR BI-TEMPORAL IMAGES CLASSIFICATION BASED ON SIAMESE VARIATIONAL AUTOENCODER AND TRANSFER LEARNING

Abstract

Siamese structures empower Deep Learning (DL) models to increase their efficiency by learning how to extract the relevant temporal features from the input data. In this paper, a Siamese Variational Auto-Encoder (VAE) model based on transfer learning (TL) is applied for change detection (CD) using bi-temporal images. The introduced method is trained in a supervised strategy for classification tasks. Firstly, the suggested generative method utilizes two VAEs to extract features from bi-temporal images. Subsequently, concatenates them into a feature vector. To get a classification map of the source scene, the classifier receives this vector and the ground truth data as input. The source model is fine-tuned to be applied to the target scene with less ground truth data using a TL strategy. Experiments were carried out in two study areas in the arid regions of southern Tunisia. The obtained results reveal that the proposed method outperformed the Siamese Convolution Neural Network (SCNN) by achieving an accuracy of more than 98%, in the source scene, and increased the accuracy in the target scene by 1.25% by applying the TL strategy.

1. INTRODUCTION

The feature extraction step in the classification process allows improving DL model performance in several fields (Hakak et al., 2021; Islam & Nahiduzzaman, 2022; Xiong & Zuo, 2022) . In fact, Convolutional neural network (CNN) has been efficiently employed to solve computer vision problems in a variety of fields including industry, environment, and healthcare (Alzubaidi et al., 2021; Huang et al., 2022) . Nevertheless, the performance of the algorithms depends of the used datasets. Furthermore, CNN has shown a low performance in the classification task thanks to the high similarity and non-dispersity of the input data. Recently, with these challenging, the VAE has demonstrated its good performance in the classification tasks as it is based on distribution-free assumptions and nonlinear approximation (Zerrouki et al., 2020; Ran et al., 2022) . However, the periodicity of the input data reduces its efficiency and, therefore, makes it unable to ensure the temporal consistency of the extracted features (Zhao & Peng, 2022). Moreover, traditional DL models (e.g. CNN, VAE, etc.) cannot capture the temporal information. Thus, they have limited capability to extract the temporal features. To overcome this shortcoming, the Siamese structure, which is one of the best approaches for CD in bi-temporal images, can be a good solution. Siamese networks were first utilized for signature verification. Subsequently, they were applied in feature matching, particularly between pairs of images (Ghosh et al., 2021; Zhang et al., 2022) . Recent studies focusing on classification tasks have employed bi-temporal images for CD (Lee et al., 2021; Zheng et al., 2022) . The CD process consists in identifying the differences between bi-temporal images of the same geographic location undergoing anthropic and climatic factors . Exploring the generalization of Siamese DL models is a key challenge. Discussing its TL capabilities is one of the most popular analyses (Krishnamurthy et al., 2021; Abou Baker et al., 2022) . The TL aims at gaining knowledge by solving a problem and applying it to another related problem. The use of TL in practice is to apply knowledge from one context with several labeled data to another situation with limited labels. In application, TL consists in re-using the weight values of the trained model with source data, while applying a fine-tuning approach to provide a model adapted to the target data (Raffel et al., 2020; Shabbir et al., 2021; Toseef et al., 2022) . By employing the pre-trained model source as the target scene adapter instead

