LATENTAUGMENT: DYNAMICALLY OPTIMIZED LA-TENT PROBABILITIES OF DATA AUGMENTATION

Abstract

Although data augmentation is a powerful technique for improving the performance of image classification tasks, it is difficult to identify the best augmentation policy. The optimal augmentation policy, which is the latent variable, cannot be directly observed. To address this problem, this study proposes LatentAugment, which estimates the latent probability of optimal augmentation. The proposed method is appealing in that it can dynamically optimize the augmentation strategies for each input and model parameter in learning iterations. Theoretical analysis shows that LatentAugment is a general model that includes other augmentation methods as special cases, and it is simple and computationally efficient in comparison with existing augmentation methods. Experimental results show that the proposed LatentAugment has higher test accuracy than previous augmentation methods on the CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.



Data augmentation is a widely used technique for generating additional data to improve the performance of computer vision tasks (Shorten & Khoshgoftaar, 2019) . Although data augmentation performs well in experimental studies, designing data augmentations requires human expertise with prior knowledge of the dataset, and it is often difficult to transfer the augmentation strategies across different datasets (Krizhevsky et al., 2012) . Recent studies on data augmentation consider an automated design process of searching for augmentation strategies from a dataset. For example, Au-toAugment, proposed by Cubuk et al. (2018) , uses reinforcement learning to automatically explore data augmentation policies using smaller network models and reduced datasets. Although AutoAugment shows great improvement on image classification tasks of different datasets, it requires thousands of GPU hours to search for augmentation strategies. Furthermore, the data augmentation operations optimized for reduced datasets using smaller network models may not be optimal for full datasets using larger network models. To address this problem, this study proposes LatentAugment, which estimates the latent probability of the optimal augmentation customized to each input image and network model. There is no doubt that an optimal augmentation policy exists for each input image using a specific network model. However, the optimal augmentation policy, which is a latent variable, cannot be directly observed. Although a latent variable itself cannot be observed , we can estimate the probability of the latent variable being the optimal augmentation policy. LatentAugment applies Bayes' rule, to estimate the conditional probability of the augmentation policy, given the input data and network parameters. Figure 1 shows the concept of the proposed latent augmentation method. Following the Bayesian data augmentation proposed by Tran et al. (2017) , LatentAugment uses the expectationmaximization (EM) algorithm to update the model parameters. In the expectation (E)-step, the expectation of the weighted loss function is calculated using the conditional probability of the latent augmentation policies. In the maximization (M)-step, the expected loss function is minimized using the standard stochastic gradient descent. The conditional probabilities of the highest loss function with the augmentation policy were calculated using the loss function with the updated parameters and input data. The unconditional probabilities of the augmentation policies were generated using the moving average of the conditional probabilities. Note that the conditional probabilities of the latent augmentation policies are dynamically optimized for the input and updated model parameters in the iterations of the EM algorithm. The contribution of this study can be summarized as follows: The model parameters are updated by the EM algorithm. In E-step, the expectation of the weighted loss function is calculated using the conditional probability of the highest loss. In M-step, the expected loss function is minimized using the standard stochastic gradient descent. The conditional probabilities of the highest loss are calculated using the loss function with the updated parameters and input data. The unconditional probabilities of the augmentation policies are generated by the moving average of the conditional probability. ℒ ! ℒ " ℒ # ℒ $ ⋮ ℎ $ ! ℎ $ " ℎ $ # ℎ $ $ ⋮ 𝜋 ! 𝜋 " 𝜋 # 𝜋 $ ⋮ ℎ $ ! ℒ ! ℎ $ " ℒ " ℎ $ # ℒ # ℎ $ $ ℒ $ ⋮ & ℎ $ % ℒ % % Minimizing • It provides a theoretical model for LatentAugment. This study shows that LatentAugment can dynamically optimize the augmentation methods for each input and model parameter in the learning iterations by calculating the conditional probabilities of the latent augmentation policies. Furthermore, it shows that LatentAugment is a general augmentation model that includes other augmentation methods, such as Adversarial AutoAugment (Zhang et al., 2019) and uncertainty-based sampling (Wu et al., 2020) , as special cases. • LatentAugment is simple and computationally efficient. It does not require the augmentation policies to be searched before training. Adversarial AutoAugment proposes the application of a generative adversarial network (GAN) (Goodfellow et al., 2014) to solve the maximization of the minimum loss function, which requires an additional training cost for the adversarial network. In contrast, the proposed LatentAugment can solve this problem using the simple stochastic gradient descent algorithm without an adversarial network. • Experimental results show that the proposed LatentAugment can improve the test accuracy for the CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. For example, a test accuracy of 98.72% was achieved with the PyramidNet+ShakeDrop (Han et al., 2017; Yamada et al., 2018) on CIFAR-10, which is a significantly better performance compared to previous augmentation methods.



Figure1: An overview of the proposed LatentAugment. The loss functions with augmentation policies are calculated using the input data and the unconditional probability of augmentation policies. The model parameters are updated by the EM algorithm. In E-step, the expectation of the weighted loss function is calculated using the conditional probability of the highest loss. In M-step, the expected loss function is minimized using the standard stochastic gradient descent. The conditional probabilities of the highest loss are calculated using the loss function with the updated parameters and input data. The unconditional probabilities of the augmentation policies are generated by the moving average of the conditional probability.

