LIFTPOOL: BIDIRECTIONAL CONVNET POOLING

Abstract

Pooling is a critical operation in convolutional neural networks for increasing receptive fields and improving robustness to input variations. Most existing pooling operations downsample the feature maps, which is a lossy process. Moreover, they are not invertible: upsampling a downscaled feature map can not recover the lost information in the downsampling. By adopting the philosophy of the classical Lifting Scheme from signal processing, we propose LiftPool for bidirectional pooling layers, including LiftDownPool and LiftUpPool. LiftDownPool decomposes a feature map into various downsized sub-bands, each of which contains information with different frequencies. As the pooling function in LiftDownPool is perfectly invertible, by performing LiftDownPool backward, a corresponding up-pooling layer LiftUpPool is able to generate a refined upsampled feature map using the detail sub-bands, which is useful for image-to-image translation challenges. Experiments show the proposed methods achieve better results on image classification and semantic segmentation, using various backbones. Moreover, LiftDownPool offers better robustness to input corruptions and perturbations.

1. INTRODUCTION

Spatial pooling has been a critical ConvNet operation since its inception (Fukushima, 1979; LeCun et al., 1990; Krizhevsky et al., 2012; He et al., 2016; Chen et al., 2018) . It is crucial that a pooling layer maintains the most important activations for the network's discriminability (Saeedan et al., 2018; Boureau et al., 2010) . Several simple operations, such as average pooling or max pooling, have been explored for aggregating features in a local area. Springenberg et al. (2015) employ a convolutional layer with an increased stride to replace a pooling layer, which is equivalent to downsampling. While effective and efficient, simply using the average or maximum activation may ignore local structures. In addition, as these functions are not invertible, upsampling the downscaled feature maps can not recover the lost information. Different from existing pooling operations, we propose in this paper a bidirectional pooling called LiftPool, including LiftDownPool which preserves details when downsizing the feature maps, and LiftUpPool for generating finer upsampled feature maps. LiftPool is inspired by the classical Lifting Scheme (Sweldens, 1998) from signal processing, which is commonly used for information compression (Pesquet-Popescu & Bottreau, 2001) , reconstruction (Dogiwal et al., 2014), and denoising (Wu et al., 2004) . The perfect invertibility of the Lifting Scheme stimulates some works on invertible networks (Dinh et al., 2017; Jacobsen et al., 2018; Atanov et al., 2019; Izmailov et al., 2020) . The Lifting Scheme decomposes an input signal into various sub-bands with downscaled size and this process is perfectly invertible. Applying the idea of Lifting Scheme, LiftDownPool factorizes an input feature map into several downsized spatial sub-bands with different correlation structures. As shown in Figure 1 , for an image feature map, the LL sub-band is an approximation removing several details. The LH, HL and HH represent details along horizontal, vertical and diagonal directions. LiftDownPool respects preserving any sub-band(s) as the pooled result. Moreover, due to the invertibility of the pooling function, Lift-UpPool is introduced for upsampling feature maps. Upsampling a feature map is more challenging as seen for the MaxUpPool (Badrinarayanan et al., 2017) , which generates an output with many 'holes' (shown in Figure 1 ). LiftUpPool utilizes the recorded details to recover a refined output by performing LiftDownPool backwards. 1

