DySR: ADAPTIVE SUPER-RESOLUTION VIA ALGO-RITHM AND SYSTEM CO-DESIGN

Abstract

Super resolution (SR) is a promising approach for improving the quality of low resolution streaming services on mobile devices. On mobile devices, the available computing and memory resources change dynamically depending on other running applications. Due to the high computation and memory demands of SR models, it is essential to adapt the model according to available resources to harvest the best possible model performance while maintaining quality of service (QoS), such as meeting a minimum frame rate and avoiding interruptions. Nevertheless, there is no SR model or machine learning system that supports adaptive SR, and enabling adaptive SR model on mobile devices is challenging because adapting model can cause significant frame rate drop or even service interruption. To address this challenge, we take an algorithm and system co-design approach and propose a Dynamic Super Resolution framework called DySR that maintains QoS while maximizing the model performance. During the training stage, DySR employs an adaptation-aware one-shot Neural Architecture Search to produce sub-graphs that share kernel operation weights for low model adaptation overhead while striking a balance between performance and frame rate. During the inference stage, an incremental model adaptation method is developed for further reducing the model adaptation overhead. We evaluate on a diverse set of hardware and datasets to show that DySR can generate models close to the Pareto frontier while maintaining a steady frame rate throughput with a memory footprint of around 40% less compared to the assembled baseline methods.

1. INTRODUCTION

Deep super-resolution (SR) has been widely used in applications such as medical imaging (Li et al. (2021) ), satellite imaging (Shermeyer & Van Etten (2019) ), and image restoration (Qiu et al. (2019) ). SR has attracted lots of attentions in recent years due to the surging demands in mobile services such as video conference, content sharing, and video streaming, where it helps provide high-resolution visual content from low-resolution data source (Zhang et al. (2020) ; Li et al. (2020; 2021) ). SR models are resource demanding (Li et al. (2021); Lu & Hu (2022) ) and need to meet Quality of Service (QoS) standards to provide good user experience in visual services. Examples of QoS including meeting a minimum frame rate and avoiding interruptions so that users perceive smooth motions. This, however, is challenging for mobile devices where computing and memory resources are limited and the availability of which also depends on other running applications. To meet QoS for different mobile devices, existing works develop models for specific devices (Liu et al. 2021)) to generate multiple hardware-tailored models. However, none of these approaches considers the fluctuating resource environment of mobile devices and often leads to poor QoS. One potential way to achieve good QoS is to dynamically adapt the model based on available resources. The challenges are two folds. First, how to design an adaptive model. Second, how to enable model adaptation in a live inference system. To enable adaptive model, we employ NAS to generate a set of models with different sizes so that the most profitable model is used under each resource availability situation to ensure a steady frame rate while maximizing the model performance. Unfortunately, none of existing machine learning frameworks supports live model adaptation. To enable model adaptation in actual system, we explore two ideas. The first idea is to use an assemble method to keep all models loaded in the system at all times to avoid model switching overhead. However, such a method results in a significantly larger memory footprint, which is unsuitable for mobile devices. The second idea is to load a single model at a time, but the the model switching overhead is high as it interrupts the steaming for 1-3 seconds each time it switches models, leading to even worse QoS. To achieve low resource consumption while minimizing the model switching overhead, we propose DySRfoot_0 , an algorithm and system co-design approach for adaptive SR. To keep a small memory footprint and minimize adaptation overhead, DySR employs an adaptation-aware one-shot NAS approach, where a large meta-graph is trained in one-shot such that sub-graphs share kernel operation weights while exploring the best tradeoffs between performance and frames-per-second (FPS). During inference, the meta-graph is fully loaded in the memory and operations are dynamically adapted according to the real-time resource availability in an incremental manner, i.e., only affected operations are swapped or rerouted. Since we do not need to load new models from the hard disk, there is no data transfer overhead. We evaluate DySR against baselines across a wide variety of hardware (from powerful GPUs to lowend mobile processors) using image and video SR datasets (e.g., Urban100 (Huang et al. ( 2015)) and Vimeo90k (Xue et al. ( 2019)). Results show that DySR can generate models close to the Pareto frontier of the performance vs. FPS tradeoffs while maintaining a steady frame rate throughput with low memory footprint (40% less compared to ensemble method). 2021)) but existing works only focus on designing a single model and do not consider QoS for streaming on mobile devices. In Section 5, we compare DySR with existing SR models. The results show that our model achieves Pareto optimal performance while meeting QoS.



https://github.com/syed-zawad/srnas



(2021b); Lee et al. (2019); Ayazoglu (2021)) or use Neural Architecture Search (NAS) (Chu et al. (2021); Guo et al. (2020); Huang et al. (

SR. (Dong et al. (2014)) is among the first works that employs deep learning models for superresolution. Since then deeper and more complex models such as (Soh et al. (2019); Nazeri et al. (2019)) were proposed for better performance. Generative Adversarial Networks (GANs) (Creswell et al. (2018); Wang et al. (2019); Ahn et al. (2018); Wang et al. (2018a)) and its variations (Prajapati et al. (2021); Guo et al. (2020); Shahsavari et al. (2021)) have been shown to be highly effective in tackling this task. Attention mechanisms were introduced to SR as well (Zhao et al. (2020); Mei et al. (2021); Chen et al. (2021)). Methods such as network pruning, knowledge distillation, and quantization have been applied to reduce computational overhead of existing SR deep learning models (Jiang et al. (2021); Zhang et al. (2021b); Hong et al. (2022); Wang et al. (2021)). However, all the above efforts focus on building a single model for each hardware and do not consider the dynamic resource environment in mobile devices, and thus fall short in meeting streaming QoS for mobile devices. NAS. Earlier neural architecture search methods rely on Reinforcement Learning (RL) (Zoph & Le (2016)) and evolutionary algorithms (Lu et al. (2018); van Wyk & Bosman (2019)) for architecture engineering. However, these methods are extremely resource demanding and often require thousands of GPU hours. Later works such as (Wen et al. (2020); Jin et al. (2019)) introduce performance prediction, shared weight training, and proxy training to speed up the architecture engineering process. DARTS (Liu et al. (2018)) and its followups (Chen et al. (2019); Wu et al. (2019)) adopt a differentiable architecture search paradigm. The once-for-all work (Cai et al. (2019)) proposes the idea of generating a single model for multiple hardware deployments though pruning and model swapping is needed for each deployment. One-shot NAS (Bender et al. (2018)) and its variations (Huang & Chu (2021); Zhang et al. (2021a); Zhao et al. (2021)) can generate models with few search iterations and have been explored for SR (Chu et al. (2021); Guo et al. (2020); Liu et al. (2021a); Zhan et al. (

