EFFICIENT ONLINE AUGMENTA-TION WITH RANGE LEARNING

Abstract

State-of-the-art automatic augmentation methods (e.g., AutoAugment and Ran-dAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges for each operation, which may lead to sub-optimal policies. To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations. RangeAugment uses an auxiliary loss based on image similarity as a measure to control the range of magnitudes of augmentation operations. As a result, RangeAugment has a single scalar parameter for search, image similarity, which we simply optimize via linear search. RangeAugment integrates seamlessly with any model and learns model-and task-specific augmentation policies. With extensive experiments on the ImageNet dataset across different networks, we show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations. Experimental results on semantic segmentation and contrastive learning further shows RangeAugment's effectiveness.

1. INTRODUCTION

Data augmentation is a widely used regularization method for training deep neural networks (LeCun et al., 1998; Krizhevsky et al., 2012; Szegedy et al., 2015; Perez & Wang, 2017; Steiner et al., 2021) . These methods apply carefully designed augmentation (or image transformation) operations (e.g., color transforms) to increase the quantity and diversity of training data, which in turn helps improve the generalization ability of models. However, these methods rely heavily on expert knowledge and extensive trial-and-error experiments. Recently, automatic augmentation methods have gained attention because of their ability to search for augmentation policy (e.g., combinations of different augmentation operations) that maximizes validation performance (Cubuk et al., 2019; 2020; Lim et al., 2019; Hataya et al., 2020; Zheng et al., 2021) . In general, most augmentation operations (e.g., brightness and contrast) have two parameters: (1) the probability of applying them and (2) their range of magnitudes. These methods take a set of augmentation operations with a fixed (often discretized) range of magnitudes as an input, and produce a policy of applying some or all augmentation operations along with their parameters (Fig. 1 ). As an example, AutoAugment (Cubuk et al., 2019) discretizes the range of magnitudes and probabilities of 16 augmentation operations, and searches for sub-policies (i.e., composition of two augmentation operations along with their probability and magnitude) in a space of about 10 32 possible combinations. These methods empirically show that automatic augmentation policies help improve performance of downstream networks. For example, AutoAugment improves the validation top-1 accuracy of ResNet-50 (He et al., 2016) by about 1.3% on the ImageNet dataset (Deng et al., 2009) . In other words, these methods underline the importance of automatic composition of augmentation operations in improving validation performance. However, policies generated using these networks may be sub-optimal because they use hand-crafted magnitude ranges. The importance of magnitude ranges for each augmentation operation is still an open question. An obstacle in answering this question is the range of magnitudes for most augmentation operations is continuous, which makes the search computationally intractable. This paper introduces RangeAugment, a simple and efficient method to learn the range of magnitudes for each augmentation operation. Inspired by image similarity metrics (Hore & Ziou, 2010) , RangeAugment introduces an auxiliary augmentation loss that allows us to learn the range of magnitudes for each augmentation operation. We realize this by controlling the similarity between the input and the augmented image for a given model and task. Rather than directly specifying the parameters for each augmentation operation, RangeAugment takes a target image similarity value as an input. The loss function is then formulated as a combination of the empirical loss and an augmentation loss. The objective of the augmentation loss is to match the target image similarity value. Therefore, the search objective in RangeAugment is to find the target similarity value that provides a good trade-off between minimizing the augmentation loss (i.e., matching the target similarity value) and the empirical loss. As a result, the augmentation policy learning in RangeAugment reduces to searching for a single scalar parameter, target image similarity, that maximizes downstream model's validation performance. We search for this target image similarity value via linear search. Empirically, we observe that this trade-off between the augmentation and empirical loss leads to better generalization ability of downstream model. Compared to existing automatic augmentation methods that require a large set of augmentation operations (usually 14-16 operations), RangeAugment is able to achieve competitive performance with only three simple operations (brightness, contrast, and additive Gaussian noise). Because RangeAugment's search space is independent of augmentation parameters and is fully differentiable (Fig. 1 ), it can be trained end-to-end with any downstream model to learn model-and task-specific policies (Fig. 2 ). We empirically demonstrate in Section 4 that RangeAugment allows us to learn model-specific policies when trained end-to-end with downstream models on the ImageNet dataset (Fig. 2a ). Im-portantly, RangeAugment achieves competitive performance to existing automatic augmentation methods (e.g., AutoAugment) with 4 to 5 times fewer augmentation operations. In Section 5, we apply RangeAugment to semantic segmentation and contrastive learning to demonstrate its simplicity and seamless integration ability to different tasks. We further show that RangeAugment learn task-specific policies (Fig. 2b ). To the best of our knowledge, RangeAugment is the first automatic augmentation method that learns the range of magnitudes for each augmentation operation.

2. RELATED WORK

Data augmentation combines different augmentation operations (e.g., random brightness, random contrast, random Gaussian noise, and data mixing) to synthesize additional training data. Traditional augmentation methods rely heavily on expert knowledge and extensive trial-and-error experiments. In practice, these manual augmentation methods have been used to train different models on a variety of datasets and tasks (e.g., Szegedy et al., 2015; He et al., 2016; Zhao et al., 2017; Howard et al., 2019) . However, these policies may not be optimal for all models. Motivated by neural architecture search (Zoph & Le, 2017) , recent methods focus on finding optimal augmentation policies automatically from data. AutoAugment formulates automatic augmentation as a reinforcement learning problem, and uses model's validation performance as a reward to find an augmentation policy leading to optimal validation performance. Because AutoAugment searches for several augmentation policy parameters, the search space is enormous and computationally intractable on large datasets and models. Therefore, in practice, policies found for smaller datasets are transferred to larger datasets. Since then, many follow-up works have focused on reducing the search space while delivering a similar performance to AutoAugment (Ratner et al., 2017; Lemley et al., 2017; LingChen et al., 2020; Li et al., 2020; Zheng et al., 2021; Liu et al., 2021a) . The first line of research reduces the search time by introducing different hyper-parameter optimization methods, including population-based training (Ho et al., 2019) , density matching (Lim et al., 2019; Hataya et al., 2020) , and gradient matching (Zheng et al., 2021) . The second line of research reduces the search space by making practical assumptions (Cubuk et al., 2020; Müller & Hutter, 2021) . For instance, RandAugment (Cubuk et al., 2020) applies two transforms randomly with uniform probability. With these assumptions, RandAugment reduces AutoAugment's search space from 10 32 to 10 2 while maintaining downstream networks performance. One common characteristic among these different automatic augmentation methods is that they use fixed and manually-defined range of magnitudes for different augmentation operations, and focus on diversifying training data by using a large set of augmentation operations (e.g., 14 transforms). This is because the range of magnitudes for most augmentation operations is continuous and large, and searching over this large range is practically infeasible. Unlike these works, RangeAugment focuses on learning the magnitude range of each augmentation operation (Figs. 1 and 2 ). We show in Section 4 that RangeAugment is able to learn model-and task-specific policies while delivering competitive performance to previous automatic augmentation methods across different models.

3. RA N G EAU G M E N T

Existing automatic augmentation methods search for composite augmentations over a large set of augmentation operations, but each augmentation operation has a manually-defined range of magnitudes. This paper introduces RangeAugment, a method for learning a range of magnitude for each augmentation operation (Fig. 3 ). RangeAugment uses image similarity between the input and augmented image to learn the range of magnitudes for each augmentation operation. In the rest of this section, we first formulate the problem (Section 3.1) and then elaborate on RangeAugment's policy learning method (Section 3.2), followed by implementation details ( Section 3.3).

3.1. PROBLEM FORMULATION

Let T = {T 1 , • • • , T N } be a set of N differentiable augmentation operations. Each augmentation operation T ∈ T is parameterized by a scalar magnitude parameter m ∈ R such that T (•; m) : X → X is defined on the image space X . Let π ϕ be an augmentation policy that defines a distribution over sub-policies S ∼ π ϕ in RangeAugment such that S = {T i (•; m i )} N i=1 . A sub-policy S applies augmentation operations to an input image x with uniform probability as S(x) := x (N ) , x (i) = T i (x (i-1) ; m i ), x (0) = x. For any given model and task, the goal of automatic augmentation is to find the augmentation policy π ϕ that diversifies training data, and helps improve model's generalization ability the most. RangeAugment learns the range of magnitudes for each augmentation operation in T . Formally, the policy parameters in RangeAugment are ϕ = {(a i , b i )} N i=1 and the magnitude parameter m i ∼ U (a i , b i ) for the i-th augmentation operation in a sub-policy S = {T i (•; m i )} N i=1 is uniformly sampled, where a i ∈ R and b i ∈ R are learned parameters.

3.2. POLICY LEARNING

Diverse training data can be produced by using wider range of magnitudes, (a i , b i ), for each augmentation operation. However, directly searching for the optimal values of (a i , b i ) for each model and dataset is challenging because of its continuous nature. To address this, RangeAugment introduces an auxiliary loss which, in conjunction with the task-specific empirical loss, enables learning model-specific range of magnitudes for each augmentation operation in an end-to-end fashion. Let d : X ×X → R be a differentiable image similarity function that measures the similarity between the input and the augmented image. To control the range of magnitudes for each augmentation operation, RangeAugment minimizes the distance between the expected value of d(x, S(x)) and a target image similarity value ∆ ∈ R using an augmentation loss function L ra (e.g., smooth L1 loss or L2 loss). An example of d and ∆ are PSNR and target PSNR value respectively. When target PSNR value is small, the difference between the input and augmented image obtained after applying an augmentation operation (say brightness) will be large. In other words, for a smaller target PSNR value, the range of magnitudes for brightness operation will be wider and vice-versa. For a given value of ∆ and parameterized model f θ with parameters θ, the overall loss function to learn model-and task-specific augmentation policy is a weighted sum of the augmentation loss L ra and the task-specific empirical loss L task : θ * , ϕ * = arg min θ,ϕ E (x,y)∼Dtrain E S∼π ϕ [L task (f θ (S(x)), y) + λL ra (x, S(x); ∆)] , where λ and D train represent weight term and training set respectively. Note that, in Eq. ( 2), re-parameterization trick on uniform distributions (Kingma & Welling, 2013) is applied to backpropagate through the expectation over S ∼ π ϕ . The ∆ in Eq. ( 2) allows RangeAugment to control diversity of augmented samples. Therefore, the augmentation policy learning in RangeAugment reduces to searching a single scalar parameter, ∆, that maximizes downstream model's validation performance. RangeAugment finds the optimal value of ∆ using a linear search.

3.3. IMPLEMENTATION DETAILS

We use PSNR as the image similarity function d in our experiments because it is (1) a standard image quality metric, (2) differentiable, and (3) fast to compute. Across different downstream networks, we observe a 0.5%-3% training overhead over the empirical risk minimization baseline. To find an optimal value of ∆ in Eq. ( 2), we study two approaches: (1) fixed target PSNR (∆ ∈ {5, 10, 20, 30}) and ( 2) target PSNR with a curriculum, where the value of ∆ is annealed from 40 to δ and δ ∈ {5, 10, 20, 30}. The learned ranges of magnitudes, (a, b), can scale beyond the image space (e.g., negative values) and result in training instability. To prevent this, we clip the range of magnitudes if they are beyond extreme bounds of augmentation operations. We choose these extreme bounds such that objects in an image are hardly identifiable at or beyond the extreme points of the bounds (see Fig. 4 ). Also, because the focus of RangeAugment is to learn the range of magnitudes for each augmentation operation, we apply all augmentation operations in T with uniform probability. To demonstrate the importance of magnitude ranges, we study RangeAugment with three basic operations (brightness, contrast, and additive Gaussian noise), and show empirically in Section 4 that RangeAugment can achieve competitive performance to existing methods with 4 to 5 times fewer augmentation operations.

4. EVALUATING RA N G EAU G M E N T ON IMAGE CLASSIFICATION

RangeAugment can learn model-specific augmentation policies. To evaluate this, we first study the importance of single and composite augmentation operations using ResNet-50 on the ImageNet dataset (Section 4.2). We then study model-level generalization of RangeAugment (Section 4.3).

4.1. EXPERIMENTAL SET-UP

Dataset For image classification, we use the ImageNet dataset that has 1.28M training and 50k validation images spanning across 1000 categories. We use top-1 accuracy to measure performance. Baseline models To evaluate the effectiveness of RangeAugment, we study different CNN-and transformer-based models. We group these models into two categories based on their complexity: (1) mobile: MobileNetv1 (Howard et al., 2017) , MobileNetv2 (Sandler et al., 2018) , MobileNetv3 (Howard et al., 2019) , and MobileViT (Mehta & Rastegari, 2021) and (2) non-mobile: ResNet-50 (He et al., 2016) , ResNet-101, EfficientNet (Tan & Le, 2019) , and SwinTransformer (Liu et al., 2021b) . We implement RangeAugment using the CVNets library (Mehta et al., 2022) effect of these variables on ResNet-50's performance on the ImageNet dataset. We can make the following observations: 1. Fig. 6a shows that single augmentation operation (N = 1) with wider magnitude ranges (e.g., the range of magnitudes at target PSNR of 5 are wider than the ones at target PSNR of 15) helped in improving ResNet-50's validation accuracy and reducing training accuracy, thereby improving its generalization capability (Fig. 5a ). Particularly, increasing the magnitude range of contrast (or brightness) operation increased ResNet-50's validation performance over baseline by 1.0% (or 0.7%) while decreasing the training performance by 2% (or 3%). This is likely because wider magnitude ranges of an augmentation operation increases diversity of training data. On the other hand, the additive Gaussian noise operation with a wider magnitude range slightly dropped the validation accuracy, but still reduces the training accuracy. In other words, it improves ResNet-50's generalization ability. 2. Composite augmentation operations (N > 1; Figs. 5b and 5c ) reduces the training accuracy significantly while having a validation performance similar to single augmentation operation. This is expected as composite operations further increases training data diversity. 3. Fig. 6c shows that progressively learning to increase the diversity of augmented samples (i.e., narrower to wider magnitude rangesfoot_2 ) using a cosine curriculum further improves the performance (Fig. 5c ). A plausible explanation is that the learned magnitude ranges of different augmentation operations using RangeAugment with fixed target PSNR may get stuck near poor solutions. Because the range of magnitudes are wider for these solutions (e.g., the range of magnitudes at target PSNR value of 5 in Fig. 6a & Fig. 6b ), RangeAugment samples more diverse data from the beginning of the training, making training difficult. Gradually annealing target PSNR from high to low (e.g., 40 to 5 in Fig. 6c ) allows RangeAugment to increase the data diversity slowly, thereby helping model to learn better representations. Importantly, it also allows RangeAugment to identify useful ranges for each augmentation operation. For example, the range of magnitudes for the additive Gaussian noise operation is relatively narrower when optimizing with curriculum (e.g., annealing target PSNR from 40 to 5; Fig. 6c ) compared to optimizing with a fixed target PSNR of 5 (Fig. 6b ). This indicates that noise operation with narrow magnitude range is favorable for training ResNet-50 on the ImageNet classification task. This concurs with results in Figs. 5a and 5b where we observed that noise operation does not improve ResNet-50's validation performance. Moreover, our findings with progressively increasing data diversity are consistent with previous works (e.g., Bengio et al., 2009; Tan & Le, 2021 ) that shows scheduling training samples from easy to hard helps improve model's performance. Interestingly, ResNet-50 with RangeAugment (N = 3) achieves comparable validation accuracy and a smaller generalization gap as compared to state-of-the-art methods which use more augmentation operations (N > 14) to increase data diversity during training. We conjecture that the difference in the generalization gap between existing methods and RangeAugment is probably caused by insufficient policy search in existing methods as they use manually-defined magnitude ranges for each augmentation operation during search. Observation 1: Composite augmentation operations with wider magnitude ranges is important for improving downstream model's generalization ability. In the rest of experiments, we will use all three augmentations (N = 3) with cosine curriculum.

4.3. MODEL-LEVEL GENERALIZATION OF RA N G EAU G M E N T

Fig. 5 shows RangeAugment is effective for ResNet-50. Natural questions that arise are: 1. Can RangeAugment be applied to other vision models? RangeAugment's seamless integration ability with little training overhead (0.5% to 3%) over the baseline model allows us to study the generalization capability of different vision models easily. Fig. 7 shows the performance of different models with RangeAugment. When data regularization is increased for mobile models by decreasing the target PSNR value from 40 to 5, the training as well as validation accuracy of different mobile models is decreased significantly as compared to the baseline. This is likely because of the limited capacity of these models. On the other hand, data regularization improved the performance of non-mobile models significantly. Consistent with our observations for ResNet-50 in Fig. 5 , we found that non-mobile models trained with RangeAugment are able to achieve competitive performance to state-of-the-art automatic augmentation methods, such as AutoAugment (N = 16), but with fewer augmentations (N = 3). 2. Does RangeAugment learn architecture-specific augmentations? Fig. 2a shows the learned magnitude ranges for different augmentation operations for a transformer-(SwinTransformer) and a CNN-based (EfficientNet) model. Though both of these architectures use the same curriculum in RangeAugment (i.e., target PSNR is annealed from 40 to 5), they learn different magnitude ranges for each augmentation operation. This shows that RangeAugment is capable of learning model-specific magnitude ranges for each augmentation operation. (a) Mobile models 95.9 96.9 97.9 98.9 99.9 Training Top-1 (%) and is in accordance with previous works on the ImageNet dataset (Radosavovic et al., 2020; Wightman et al., 2021) . These results show that training models with RangeAugment leads to a stable model performance. Observation 2: Non-mobile models benefit from data regularization. On the ImageNet dataset, we recommend to train non-mobile models using a curriculum that anneals ∆ from high (e.g., target PSNR=40) to low (e.g., target PSNR=5 or 10) similarity between input and augmented images.

5. TASK-LEVEL GENERALIZATION OF RA N G EAU G M E N T

Section 4 shows the effectiveness of RangeAugment on different downstream models on the Im-ageNet dataset. However, one might ask whether RangeAugment can be used for tasks other than image classification. To evaluate this, we study RangeAugment with two tasks, semantic segmentation (Section 5.1) and contrastive image-language pre-training (Section 5.2).

5.1. SEMANTIC SEGMENTATION ON THE ADE20K DATASET

Dataset and baseline models We use ADE20k dataset (Zhou et al., 2017) that has 20k training and 2k validation images across 150 semantic classes. We report the segmentation performance in terms of mean intersection over union (mIoU) on the validation set. We integrate mobile and non-mobile classification models with the Deeplabv3 segmentation head (Chen et al., 2018a ) and finetune each model for 50 epochs. See Appendix F.1 for training details. We do not study SwinTransformer for semantic segmentation because it is not compatible with Deeplabv3's segmentation head design as it adjusts the atrous rate of convolutions to control the output stride of backbone network. Baseline augmentation methods For semantic segmentation, experts have hand-crafted augmentation policies, and these manual policies are used to train state-of-the-art semantic segmentation methods (e.g., Chen et al., 2017; Zhao et al., 2017; Xie et al., 2021; Liu et al., 2021b) . For an apples to apples comparison, we train each segmentation model with three different random seeds and compare with the following baselines: (1) Baseline -standard pre-processing (randomly resize short image dimension, random horizontal flip, and random crop), (2) Manual -baseline pre-processing with hand-crafted augmentation operations (color jittering using photometric distortion, random rotation, and random Gaussian noise), and (3) RangeAugment -baseline pre-processing with learnable range of magnitudes for brightness, contrast, and noise (i.e., N = 3). For reference, we include DeepLabv3 results (if available) from a popular segmentation library (MMSegmentation, 2020). Results Table 1 shows that RangeAugment improves the performance of different models in comparison to other augmentation methods. Interestingly, for semantic segmentation, the magnitude range of additive Gaussian noise is wider compared to image classification (Fig. 2b ). This concurs with previous manual augmentation methods which also found that Gaussian noise is important for semantic segmentation (Zhao et al., 2017; Asiedu et al., 2022) . Overall, these results suggest that RangeAugment learns task-specific augmentation policies.

5.2. CONTRASTIVE IMAGE-LANGUAGE PRE-TRAINING ON THE LAION-400M DATASET

Dataset and baseline models We crawl the LAION-400M dataset (Schuhmann et al., 2021) and download about 304M image-language pairs, which are then used for pre-training. We report zeroshot top-1 accuracy on ImageNet's validation set and use the same language prompts as OpenCLIP (Ilharco et al., 2021) . We train CLIP (Radford et al., 2021) with RangeAugment from scratch (Appendix F.1). The model uses ViT-B/16 (Dosovitskiy et al., 2020) as its image encoder and transformer as its text encoder, and minimizes contrastive loss during training. We use multi-scale sampler of Mehta et al. (2022) to make CLIP more robust to input scale changes. Because less data regularization is required at a scale of 100M+ samples (Radford et al., 2021; Zhai et al., 2022) , we anneal the target PSNR in RangeAugment from 40 to 20. We compare the performance with CLIP and OpenCLIP. Results Table 2 compares the zero-shot performance of different models. For the same training dataset and zero-shot language prompts, RangeAugment delivers 1.2% better performance than OpenCLIP at an inference resolution of 224 × 224. Observation 3: RangeAugment learns model-and task-specific augmentation policies.

6. CONCLUSION

This paper introduces an end-to-end method for learning model-and task-specific automatic augmentation policies with a constant search time. We demonstrated that RangeAugment delivers competitive performance to existing methods across different downstream models on the image classification task. This is despite the fact that RangeAugment uses only three basic augmentation operations as opposed to a large set of complex augmentation operations in existing methods. These results underline the importance of magnitude range of augmentation operations in automatic augmentation. We also showed that RangeAugment can be seamlessly integrated with other tasks and achieve similar or better performance than existing methods. In the future, we plan to apply RangeAugment to learn the range of magnitudes for complex augmentation operations (e.g., geometric transformations) using different image similarity functions (e.g., SSIM). In addition to learning the range of magnitudes of each augmentation operation, we plan to apply RangeAugment to learn how to compose different augmentation operations with a constant search time.

A COMPARISON WITH EXISTING METHODS

ImageNet classification State-of-the-art methods incorporate random erasing (Zhong et al., 2020) , mixup transforms (Zhang et al., 2017; Yun et al., 2019) in addition to automatic augmentation methods (e.g., RandAugment and AutoAugment). Table 3 shows that models trained with RangeAugment are able to achieve similar or better performance than existing automatic augmentation methods with 4 -5× more augmentation operations. Semantic segmentation on ADE20k Table 4 compares the performance of different segmentation architectures for the same backbone. Compared to highly-tuned augmentation recipes of MMSeg (MMSegmentation, 2020) and CSAIL (Zhou et al., 2017) segmentation libraries, RangeAugment is able to achieve better performance consistently across different backbones. Furthermore, Table 5 shows that DeepLabv3 with ResNet-101 backbone, when trained with RangeAugment, delivers the same performance as UPerNet (Xiao et al., 2018) with SwinTransformer (Liu et al., 2021b) and ConvNext (Liu et al., 2022) backbones while being 3× FLOP efficient. Overall, these segmentation results underline the effectiveness of RangeAugment.

B TRANSFERRING AUGMENTATION POLICY

Searching model-and task-specific policy may be expensive. Therefore, a common practice is to transfer the policy found on one dataset to another. This section evaluates if the augmentation curriculum of RangeAugment can be used across different tasks and datasets. We compare the accuracy of RangeAugment with publicly reproduced models as there performance is often better than those reported in the paper. For experiments in this section, we follow our observations in Section 5 and anneal the PSNR value ∆ from 40 to 20. Object detection on COCO Following previous works, we use ResNet-50 as a backbone and train Mask R-CNN on the COCO dataset (Lin et al., 2014) . Following a standard convention for reporting object detection performance of Mask R-CNN, we also report the number of optimization updates (or schedule). We use similar hyper-parameters, including learning rate, as Detectron2 (Wu et al., 2019) and MMDetection (Chen et al., 2019) . Semantic segmentation on PASCAL VOC 2012 Following previous segmentation methods, we use ResNet-101 as a backbone and train DeepLabv3 on the PASCAL VOC 2012 dataset (Everingham et al., 2012) . Table 7 shows that DeepLabv3 with RangeAugment attains the best performance. 

C ABLATIONS ON THE IMAGENET DATASET

In this section, we study different components of RangeAugment using ResNet-50. For learning augmentation policy, we anneal the target image similarity (PSNR) value ∆ from 40 to 5.

Effect of different curriculum

We trained RangeAugment with two curriculum's: (1) linear and (2) cosine. We found that cosine curriculum delivers 0.1-0.2% better performance than linear. Therefore, we use cosine curriculum. Effect of λ The weight term, λ, in Eq. ( 2) allows RangeAugment to balance the trade-off between augmentation loss L ra and empirical loss L task . To study its impact, we vary the value of λ from 0.0 to 0.15. Empirical results in Fig. 8 shows that the good range for λ is between 0.0006 and 0.002. In our experiments, we use λ = 0.0015. Effect of joint vs. independent optimization An expected behavior for learning model-specific augmentation policy using RangeAugment is that task-specific loss L task in Eq. 2 should contribute towards policy learning. To validate it, ResNet-50 is trained independentlyfoot_3 as well as jointly on the ImageNet dataset. We found that the top-1 accuracy of ResNet-50 dropped by about 1% when it is trained independently. This is likely because independent training allowed RangeAugment to produce augmented images with more additive Gaussian noise (Fig. 9 ), resulting in performance drop. This concurs with our observations in Section 4, especially Figs. 5 and 6 , where we found that sampling augmented images from wider magnitude range for additive Gaussian noise operation dropped ResNet-50's performance on the ImageNet dataset. A plausible explanation is that PSNR is more sensitive to noise operation (Hore & Ziou, 2010) , allowing RangeAugment to learn wider magnitude ranges for noise operation when trained independently as compared to joint training. Overall, these results suggest that joint training helps in learning model-specific policy. 



As noted in Section 2, most previous works have focused on reducing the search time of AutoAugment while achieving similar performance (e.g., ResNet-50 on ImageNet: 77.6% (AutoAugment), 77.6% (RandAugment), and 77.6% (Fast AutoAugment)). Also, many state-of-the-art models (e.g., EfficientNet and SwinTransformer) have used either RandAugment or AutoAugment for data regularization. Therefore, to demonstrate the effectiveness of RangeAugment, we choose AutoAugment and RandAugment as baseline methods. Wider magnitude ranges produce diverse augmented samples and vice versa (Appendix D). The augmented image is detached before feeding to the model.



Figure 1: Comparison between RangeAugment and standard automatic augmentation methods.RangeAugment's search space is independent of augmentation parameters, allowing us to learn model-and task-specific policies in a constant time.

Figure 3: RangeAugment: End-to-end learning of augmentation policy with downstream model.

Figure 4: Example outputs of brightness operation, T (x; m) = mx, at different values of magnitude parameter m. At extremes (i.e., m = 0.1 or 10), the bus is hardly identifiable.

Figure5: The performance of ResNet-50 on the ImageNet dataset when data diversity is increased by learning the range of magnitudes for single (N = 1) and composite augmentation operations (N > 1) using RangeAugment. For curriculum learning, target PSNR value is annealed from 40 to the value mentioned in the legend.

Does RangeAugment increase variance on model performance? Because of the stochastic training and presence of randomness during different stages of training including RangeAugment, there may be some variability in model's performance. To measure the variability in model's performance, we run each experiment with three different random seeds. For different models, the standard deviation of model's validation accuracy is between 0.01 and 0.2,

Figure 7: Performance of different models on the ImageNet dataset using RangeAugment. Here, the target PSNR value is annealed from 40 to the value mentioned in the legend.

Figure 8: Effect of weight term, λ, on ResNet-50's performance on the ImageNet dataset.

Figure 9: The effect of learning magnitude ranges by jointly optimizing the loss terms L ra and L task (top row) compared to only optimizing L ra (bottom row). Training ResNet-50 with the joint loss leads to smaller magnitudes of noise, and improves validation accuracy by approximately 1% on the ImageNet dataset.

MobileNetv1 38.77 ± 0.20 38.12 ± 0.16 36.46 ± 0.19 38.20 ± 0.26 38.96 ± 0.34  39.37 ± 0.19 -MobileNetv2 37.74 ± 0.29 37.10 ± 0.27 35.58 ± 0.30 37.43 ± 0.73 38.06 ± 0.17 38.23 ± 0.36 34.08 MobileNetv3 37.58 ± 0.67 36.68 ± 0.34 34.80 ± 0.16 36.54 ± 0.12 37.77 ± 0.24 38.10 ± 0.13 .29 ± 0.17 44.04 ± 0.52 43.22 ± 0.52 43.89 ± 0.43 44.77 ± 0.28 43.95 ± 0.31 44.08 EfficientNet 40.86 ± 0.55 41.15 ± 0.65 39.42 ± 0.29 40.39 ± 0.48 41.43 ± 0.36 41.08 ± 0.36 -Semantic segmentation on the ADE20k dataset.

Zero-shot performance on ImageNet. Each entry of CLIP with RangeAugment is the same model, but evaluated at different resolutions. † Results of CLIP with the same language prompts as OpenCLIP.

Accuracy comparison of different models trained with different methods on the Ima-geNet validation set. RangeAugment with simple and 4 -5× fewer transforms is able to deliver similar or better performance to state-of-the-art methods with complex automatic augmentation policies. For mobile models, we decay ∆ from 40 to 30 while for non-mobile models, we decay ∆ from 40 to 5 (as per observations in Section 4.1). Methods whose performance is within the standard deviation range of ±0.2 of the best model are highlighted in bold. Note that RandAugment in TIMM is a custom implementation that delivers better performance than the RandAugment ofCubuk et al. (2020), and is widely used for training recent classification networks on the ImageNet, including SwinTransformers. Here, N denotes the number of augmentation operations.

Comparison between different state-of-the-art segmentation method for the same backbone. Models trained with RangeAugment is able to deliver better performance than highly-tuned augmentation pipelines in popular segmentation libraries (CSAIL(Zhou et al., 2017) and MMSeg (MMSegmentation, 2020)).

DeepLabv3 with RangeAugment delivers similar performance to UPerNet while being 3× more FLOP efficient. RangeAugment improved the segmentation accuracy of ResNet-101 with DeepLabv3 significantly; delivering competitive performance to state-of-the-art segmentation model, UPerNet, with recent backbones (SwinTransformer-and ConvNext).

Table 6 shows that RangeAugment improves the detection accuracy of Mask-RCNN significantly. Enhanced object detection results of Mask R-CNN with RangeAugment on COCO.

Comparison with state-of-the-art semantic segmentation methods with ResNet-101 backbone on the PASCAL VOC validation set. We do not use multi-scale evaluation. The results of different segmentation models are from MMSegmentation (2020). Also, our training recipes, including batch size and learning rate, are similar to MMSegmentation (2020).

Brightness

Contrast Noise 

