CAFENET: CLASS-AGNOSTIC FEW-SHOT EDGE DETECTION NETWORK

Abstract

We tackle a novel few-shot learning challenge, few-shot semantic edge detection, aiming to localize boundaries of novel categories using only a few labeled samples. Reliable boundary information has been shown to boost the performance of semantic segmentation and localization, while also playing a key role in its own right in object reconstruction, image generation and medical imaging. Few-shot semantic edge detection allows recovery of accurate boundaries with just a few examples. In this work, we present a Class-Agnostic Few-shot Edge detection Network (CAFENet) based on meta-learning strategy. CAFENet employs a semantic segmentation module in small-scale to compensate for lack of semantic information in edge labels. The predicted segmentation mask is used to generate an attention map to highlight the target object region, and make the decoder module concentrate on that region. We also propose a new regularization method based on multi-split matching. In meta-training, the metric-learning problem with highdimensional vectors are divided into smaller subproblems with low-dimensional sub-vectors. Since there are no existing datasets for few-shot semantic edge detection, we construct two new datasets, FSE-1000 and SBD-5 i , and evaluate the performance of the proposed CAFENet on them. Extensive simulation results confirm that the proposed CAFENet achieves better performance compared to the baseline methods using fine-tuning or few-shot segmentation.

1. INTRODUCTION

Semantic edge detection aims to identify pixels that belong to boundaries of predefined categories. Boundary information has been shown to be effective for boosting the performance of semantic segmentation (Bertasius et al., 2016; Chen et al., 2016) and localization (Yu et al., 2018a; Wang et al., 2015) . It also plays a key role in applications such as object reconstruction (Ferrari et al., 2007; Zhu et al., 2018) , image generation (Isola et al., 2017; Wang et al., 2018) and medical imaging (Abbass & Mousa, 2017; Mehena, 2019) . Early edge detection algorithms interpret the problem as a low-level grouping problem exploiting hand-crafted features and local information (Canny, 1986; Sugihara, 1986) . Recently, there have been significant improvements on edge detection thanks to the advances in deep learning. Moreover, beyond previous boundary detection, category-aware semantic edge detection became possible (Acuna et al., 2019; Hu et al., 2019; Yu et al., 2018b) . However, it is impossible to train deep neural networks without massive amounts of annotated data. To overcome the data scarcity issue in image classification, few-shot learning has been actively discussed for recent years (Finn et al., 2017; Lifchitz et al., 2019) . Few-shot learning algorithms train machines to learn previously unseen classification tasks using only a few relevant labeled examples. More recently, the idea of few-shot learning is applied to computer vision tasks requiring highly laborious and expensive data labeling such as semantic segmentation (Dong & Xing, 2018; Wang et al., 2019) and object detection (Fu et al., 2019; Karlinsky et al., 2019) . Based on meta-learning across varying tasks, the machines can adapt to unencountered environments and demonstrate robust performance in various computer vision problems. In this paper, we consider a novel few-shot learning challenge, few-shot semantic edge detection, to detect the semantic boundaries using only a few labeled samples. Through experiments, we show that few-shot semantic edge detection can not be simply solved by fine-tuning a pretrained semantic edge detector or utilizing a nonparametric edge detector in a few-shot segmentation setting. To tackle this elusive challenge, we propose a class-agnostic few-shot edge detector (CAFENet) and present new datasets for evaluating few-shot semantic edge detection. 

2.1. FEW-SHOT LEARNING

To tackle the few-shot learning challenge, many methods have been proposed based on metalearning. Optimization-based methods (Finn et al., 2017; Ravi & Larochelle, 2016) train the metalearner which updates the parameters of the actual learner so that the learner can easily adapt to a new task within a few labeled samples. Metric-based methods (Vinyals et al., 2016; Snell et al., 2017; Yoon et al., 2019) train the feature extractor to assemble features from the same class together on the embedding space while keeping features from different classes far apart. Recent metric-based approaches propose dense classification (Hou et al., 2019; Kye et al., 2020) . Dense classification trains an instance-wise classifier on pixel-wise classification loss which imposes coherent predictions over the spatial dimension and prevents overfitting as a result. Our model adopts the metric-based method for few-shot learning. Inspired by dense classification, we propose multi-split matching regularization which divides the feature vector into sub-vector splits and performs split-wise classification for regularization in meta-learning.

2.2. FEW-SHOT SEMANTIC SEGMENTATION

The goal of few-shot segmentation is to perform semantic segmentation within a few labeled samples based on meta-learning (Shaban et al., 2017; Dong & Xing, 2018; Wang et al., 2019) . OSLSM of (Shaban et al., 2017) adopts a two-branch structure: conditioning branch generating element-wise scale and shift factors using the support set and segmentation branch performing segmentation with a fully convolutional network and task-conditioned features. Co-FCN (Rakelly et al., 2018 ) also utilizes a two-branch structure. The globally pooled prediction is generated using support set in



Figure 1: Architecture overview of the proposed CAFENet. The feature extractor or encoder extracts feature from the image, the segmentator generates a segmentation mask based on metric learning, and the edge detector detects semantic boundaries using the segmentation mask and query features.

Fig.1shows the architecture of the proposed CAFENet. Since the edge labels do not contain enough semantic information due to the sparsity of labels, performance of the edge detector severely degrades when the training dataset is very small. To overcome this, we adopt the segmentation process in advance of detecting edge with downsized feature and segmentation labels generated from boundaries labels. We utilize a simple metric-based segmentator generating a segmentation mask through pixel-wise feature matching with class prototypes, which are computed by masked average pooling of(Zhang et al., 2018). The predicted segmentation mask provides the semantic information to the edge detector. The multi-scale attention maps are generated from the segmentation mask, and applied to corresponding multi-scale features. The edge detector predicts the semantic boundaries using the attended features. Using this attention mechanism, the edge detector can focus on relevant regions while alleviating the noise effect of external details. For meta-training of CAFENet, we introduce a simple yet powerful regularization method, Multi-Split Matching Regularization (MSMR), performing metric learning on multiple low-dimensional embedding sub-spaces during meta-training. The main contributions of this paper are as follows. First, we introduce a few-shot semantic edge detection problem for performing semantic edge detection on previously unseen objects using only a few training examples. Second, we introduce two new datasets of SBD-5 i and FSE-1000 for few-shot edge detection. Third, we propose a few-shot edge detector, CAFENet and validate the performance of the proposed method through experiments.

