THE KFIOU LOSS FOR ROTATED OBJECT DETECTION

Abstract

Differing from the well-developed horizontal object detection area whereby the computing-friendly IoU based loss is readily adopted and well fits with the detection metrics, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. In this paper, we propose an effective approximate SkewIoU loss based on Gaussian modeling and Gaussian product, which mainly consists of two items. The first term is a scale-insensitive center point loss, which is used to quickly narrow the distance between the center points of the two bounding boxes. In the distance-independent second term, the product of the Gaussian distributions is adopted to inherently mimic the mechanism of SkewIoU by its definition, and show its alignment with the SkewIoU loss at trend-level within a certain distance (i.e. within 9 pixels). This is in contrast to recent Gaussian modeling based rotation detectors e.g. GWD loss and KLD loss that involve a human-specified distribution distance metric which require additional hyperparameter tuning that vary across datasets and detectors. The resulting new loss called KFIoU loss is easier to implement and works better compared with exact SkewIoU loss, thanks to its full differentiability and ability to handle the non-overlapping cases. We further extend our technique to the 3-D case which also suffers from the same issues as 2-D. Extensive results on various datasets with different base detectors show the effectiveness of our approach.

1. INTRODUCTION

Rotated object detection is a relatively emerging but challenging area, due to the difficulties of locating the arbitrary-oriented objects and separating them effectively from the background, such as aerial images (Yang et al., 2018a; Ding et al., 2019; Yang et al., 2018b) , scene text (Jiang et al., 2017; Zhou et al., 2017) . Though considerable progresses have been recently made, for practical settings, there still exist challenges for rotating objects with large aspect ratio, dense distribution. The Skew Intersection over Union (SkewIoU) between large aspect ratio objects is sensitive to the deviations of the object positions. This causes the negative impact of the inconsistency between metric (dominated by SkewIoU) and regression loss (e.g. l n -norms), which is common in horizontal detection, and is further amplified in rotation detection. The red and orange arrows in Fig. 1 show the inconsistency between SkewIoU and Smooth L1 Loss. Specifically, when the angle deviation is fixed (red arrow), SkewIoU will decrease sharply as the aspect ratio increases, while the Smooth L1 loss is unchanged (mainly from the angle difference). Similarly, when SkewIoU does not change (orange arrow), Smooth L1 loss increases as the angle deviation increases. * Correspondence author is Junchi Yan who is also affiliated with Shanghai AI Laboratory. The work was partly done when the first author Xue Yang was an intern at Huawei Cloud. The work was also in part supported by NSFC (62222607), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102). Solution for inconsistency between the metric and regression loss has been extensively discussed in horizontal detection by using IoU loss and related variants, such as GIoU loss (Rezatofighi et al., 2019) and DIoU loss (Zheng et al., 2020b) . However, the applications of these solutions to rotation detection are blocked because the analytical solution of the SkewIoU calculation process 1 is not easy to be provided due to the complexity of intersection between two rotated boxes (Zhou et al., 2019) . Especially, there exist some custom operations (intersection of two edges and sorting the vertexes etc.) whose derivative functions have not been implemented in the existing deep learning frameworks (Abadi et al., 2016; Paszke et al., 2017; Hu et al., 2020) . Besides, the calculation of SkewIoU is not differentiable when there are more than eight intersection points between two bounding boxes, i.e. two boundary boxes are completely coincident, or one edge is coincident, which will lead to the failure to obtain very accurate prediction results. Thus, developing an easy-to-implement and fully differentiable approximate SkewIoU loss is meaningful and several works (Chen et al., 2020; Zheng et al., 2020a; Yang et al., 2021c; d) have been proposed. This paper aims to find an easy-to-implement and better-performing alternative. We design an alternative to SkewIoU loss based on Gaussian product, named KFIoU loss 2 , which can be easily implemented by the existing operations of the deep learning framework without the need for additional acceleration (e.g. C++/CUDA). Specifically, we convert the rotated bounding box into a Gaussian distribution, which can avoid the well-known boundary discontinuity and square-like problems (Yang et al., 2021c) in rotation detection. Then we use a center point loss to narrow the distance between the center of the two Gaussian distributions, follow by calculating the overlap area under the new position through the product of the Gaussian distributions. By calculating the error variance and comparing the final performance of different methods, we find trend-level alignment with the SkewIoU loss is critical for solving the inconsistency between metric and loss, and further improving the performance. Furthermore, compared to best-tuned Gaussian distance metric based methods, our proposed method achieves more competitive performance without hyperparameter tuning. The highlights are as follows: 1) For rotation detection, instead of exactly computing the SkewIoU loss which is tedious and unfriendly to differentiable learning, we propose our easy-to-implement approximate loss, named KFIoU loss, which works better since it is fully differentiable and able to handle the non-overlapping cases. It follows the protocol of Gaussian modeling for objects, yet innovatively uses Gaussian product to mimic SkewIoU's computing mechanism within a looser distance. 2) Compared to Gaussian-based losses (GWD loss, KLD loss) that try to approximate SkewIoU loss by specifying a distance which need extra hyperparameters tuning and metric selection that vary across datasets and detectors, our mechanism level simulation to SkewIoU is more interpretable and natural, and free from hyperparameter tuning. 3) We also show that KFIoU loss achieves the better trend-level alignment with SkewIoU loss within a certain distance than GWD loss and KLD loss, where the trend deviation is measured by our devised error variance. The effectiveness of such a trend-level alignment strategy is verified by comparing KFIoU loss with ideal SkewIoU loss. On extensive benchmarks (aerial images, scene texts, face), our approach also outperforms other best-tuned SOTA alternatives. 4) We further extend the Gaussian modeling and KFIoU loss from 2-D to 3-D rotation detection, with notable improvement compared with baselines. To our best knowledge, this is the first 3-D rotation detector based on Gaussian modeling which also verifies its effectiveness, which is in 1 See an open-source version with thousands of lines of code for implementing the loss at https:// github.com/open-mmlab/mmcv/pull/1854, while our new loss only costs tens of lines of code. 2 we term our loss as KFIoU as the product of Gaussian is an important step in Kalman filtering.



Figure 1: For rotation detection (Yang et al., 2021b), there is a notable inconsistency between the final detection metric i.e. mAP (largely depending on SkewIoU) and regression-based loss e.g. the popular Smooth L1. See Fig. 3(a) and Fig. 3(b) for more specific comparison.

