LEARNING BINARY NETWORKS ON LONG-TAILED DISTRIBUTIONS

Abstract

In deploying deep models to real world scenarios, there are a number of issues including computational resource constraints and long-tailed data distributions. For the first time in the literature, we address the combined challenge of learning long-tailed distributions under the extreme resource constraints of using binary networks as backbones. Specifically, we propose a framework of calibrating off-the-shelf pretrained full precision weights that are learned on non-long-tailed distributions when training binary networks on long-tailed datasets. In the framework, we additionally propose a novel adversarial balancing and a multi-resolution learning method for better generalization to diverse semantic domains and input resolutions. We conduct extensive empirical evaluations on 15 datasets including newly derived long-tailed datasets from existing balanced datasets, which is the largest benchmark in the literature. Our empirical studies show that our proposed method outperforms prior arts by large margins, e.g., at least +14.33% on average.

1. INTRODUCTION

In recent years, there grows emphasis on resource constraints, especially for edge devices, in learning deep models, resulting in breakthroughs such as MobileNet (Howard et al., 2017) and YOLO-V7 (Wang et al., 2022) that concern not only accuracy but also computational costs. This has attracted attention for deep learning both in the research and the industrial communities. Besides, long-tailed (LT) training data are frequently encountered in the wild (He & Garcia, 2009) . Thus, many deep learning methods have been developed to combat it (Cui et al., 2021; He et al., 2021; Zhong et al., 2021 ). Yet, the devices of daily usages, of many which aim to utilize these deep models, suffer from lack of sufficient computing power. Thus, for real world deployment of such deep models, methodological advances in LT recognition should also consider the resource constraints. Unfortunately, current LT recognition methods largely assume sufficient computing resources and are designed to work with a large number of full precision parameters, i.e., floating point (FP) weights. While recognition on LT distributions by FP models may have significantly improved (Cui et al., 2021; He et al., 2021; Zhong et al., 2021) , it is not clear whether these improvements would immediately translate to real world scenarios or not where resource constraints limit the model selection to those with lower capacity. To this end, we argue that it is necessary to benchmark and improve the performance of long-tailed recognition with capacity-limited models. As binary networks are at the extreme end of capacity-limited models (Rastegari et al., 2016) , the long-tailed recognition performance using the 1-bit networks would roughly correspond to the 'worst case scenario' for resource constrained LT. If we can show sufficient LT performance with binary networks, we can reasonably expect, at the very least, matching or better LT performance with N -bit models, where N > 1. Here, we take an initiative to benchmark and develop long-tailed recognition methods using binary networks as a challenging reference. In the LT scenario, the data scarcity in the tail classes is one of the major issues that may cause problems such as the class weights having varying magnitudes for head to tail classes (Kang et al., 2019; Alshammari et al., 2022) , leading to disappointing performance. Prior methods (Liu et al., 2019; Kozerawski et al., 2020; Park et al., 2022) use cosine classifiers which eliminate the effect of class weight norms as a discriminative statistic and improve accuracy. However, since binary networks lack learning capacity and exhibit worse generalization performance than FP networks (Rastegari et al., 2016; Courbariaux et al., 2016) , we want to reduce the adverse effect of the uneven class

