NETWORKS ARE SLACKING OFF: UNDERSTANDING GENERALIZATION PROBLEM IN IMAGE DERAINING

Abstract

Deep low-level networks are successful in laboratory benchmarks, but still suffer from severe generalization problems in real-world applications, especially for the deraining task. An "acknowledgement" of deep learning drives us to use the training data with higher complexity, expecting the network to learn richer knowledge to overcome generalization problems. Through extensive systematic experiments, we show that this approach fails to improve their generalization ability but instead makes the networks overfit to degradations even more. Our experiments establish that it is capable of training a deraining network with better generalization by reducing the training data complexity. Because the networks are slacking off during training, i.e. learn the less complex element in the image content and degradation to reduce the training loss. When the background image is less complex than the rain streak, the network will focus on the reconstruction of the background without overfitting the rain patterns, thus achieving a good generalization effect. Our research demonstrates excellent application potential and provides an indispensable perspective and research methodology for understanding the generalization problem of low-level vision.

1. INTRODUCTION

The whirlwind of progress in deep learning has produced a steady stream of promising low-level vision networks, which significantly outperform traditional methods in existing benchmark datasets. However, the intrinsic overfitting issue has prevented these deep models from real-world applications, especially when the degradation differs a lot from the training data. We call this dilemma the generalization problem. Although important, this problem is not well studied in low-level vision literature. We need more in-depth analysis and understanding, before proposing effective solutions. Understanding generalization in low-level vision is by no means easy. It is not a naive extension of the generalization research in high-level vision. We need dedicated analysis tools to interpret new phenomena. In this paper, we hope to build a stepping stone towards a more in-depth understanding of this problem. To achieve this goal, we select a representative low-level vision task as the breakthrough point, and design quantitative analysis methods for several controlling factors. The heart of our methodology is stated as follows. Select deraining as the representative task. Low-level vision includes many tasks, such as image denoising and super-resolution, which have different characteristics. A general understanding of generalization across all low-level vision tasks cannot be built in a day. Thus, we choose the image deraining task as a representative. Image deraining aims to remove the undesired rain streaks in an image. There are two considerations for selecting the deraining task. First, as a typical decomposition problem, image deraining has a relatively simple degradation model (a linear superimposition model). This will facilitate our research and enable the usage of many quantitative measurements. Second, the deraining task suffers from a severe generalization problem. Existing deraining models tend to do nothing for the rain streaks that are beyond their training distribution. See Figure 1 for an example. This phenomenon is very intuitive and easy to quantify. Analyze from the perspective of training data. We argue that the generalization problem is due to the network overfitting the degradation (the rain patterns in the deraining task). The main reason for this result is the inappropriate training objective. We start our analysis with the most basic and indispensable factor in constructing the training objective -training data. There has been a lot of works trying to improve real-world performance by improving the complexity of training data. This comes from a natural but unproven "acknowledgment" in low-level vision that more training data can solve the generalization problem. This acknowledgment also influences the deraining committee: when the network sees more (both background images and rain streaks), it can generalize better to more real-world scenarios. However, the generalization problem of deraining is NOT solved in this way. These methods still do not work on rain patterns that have not yet been collected. We argue that because too much background data is provided for training, the model cannot learn to reconstruct the image content and can only overfit to the degradation. Therefore, we propose to reduce the number of training background images in our study, rather than increase it further. Our analysis methods. To systematically study the changes in model behavior brought about by changing training objectives, we construct a number of training sets consisting of different background images. We first investigate the effect on the number of training set images by overfitting the model on very few (16 even 8) images. By switching between different image categories, we study the network behaviour when fitting images of different complexity levels. We study the relationship between the complexity of the background image set and the generalization performance of deraining through extensive quantitative experiments. Except for constructing training objectives, we also perform a fine-grained analysis of the model outputs. Previous works simply use the overall image quality as the performance indicator, such as PSNR. However, the reason for the quality deterioration may be either the unsuccessful removal of rain streaks or the poor reconstruction of the image background. Thus we decouple the deraining task as rain removal and background reconstruction, which are studied separately. Since the generalization problem in the deraining task is mainly related to the removal of rain streaks, fine-grained analysis can exclude the influence of other factors. Our key findings. We find that deep networks are slacking off during training. They take shortcuts in reducing the loss, resulting in poor generalization performance. This is due to the inappropriate objective we set for training. Our key finding can be summarized as: Between the image content and the additive degradation, deep networks tend to learn the less complex element in the separation task. Specifically, in the common training data with high background complexity and low rain complexity, the network will naturally learn to identify and separate rain streaks, because they are less complex and easier to learn. But when the real situation deviates from its depiction of rain, the network tends to ignore them and gets poor generalization performance. On the contrary, when we train the model on a less complex background image set, it exhibits better generalization ability, see Figure 1 (e). The reason is that when the complexity of the training image background is smaller than that of the rain patterns, the network will also take a shortcut to reduce the loss, i.e., remember the reconstruction of the background instead of overfitting to the rain streaks. Except for the removal of rain, the performance of the model is also determined by the background reconstruction. Reducing the background complexity of the training data could inevitably produce unsatisfactory reconstruction results. However, our results show that the model trained on only 256 images can already handle



Figure 1: The existing deraining models suffer from severe generalization problems. After training with synthetic rainy images, when feeding (a) an image with different rain streaks, its output (b) shows limited effect. Two intuitive ways to improve generalization performance -(c) adding background images, and (d) adding rain patterns, cannot effectively relieve the generalization issue. In this paper, we provide a new counter-intuitive insight -(e) we improve the generalization ability of the deraining networks by selecting much less training background images for training, not more.

