DEEP REINFORCEMENT LEARNING FOR WIRELESS SCHEDULING WITH MULTICLASS SERVICES

Abstract

In this paper, we investigate the problem of scheduling and resource allocation over a time varying set of clients with heterogeneous demands. This problem appears when service providers need to serve traffic generated by users with different classes of requirements. We thus have to allocate bandwidth resources over time to efficiently satisfy these demands within a limited time horizon. This is a highly intricate problem and solutions may involve tools stemming from diverse fields like combinatorics and optimization. Recent work has successfully proposed Deep Reinforcement Learning (DRL) solutions, although not yet for heterogeneous user traffic. We propose a deep deterministic policy gradient algorithm combining state of the art techniques, namely Distributional RL and Deep Sets, to train a model for heterogeneous traffic scheduling. We test on diverse number scenarios with different time dependence dynamics, users' requirements, and resources available, demonstrating consistent results. We evaluate the algorithm on a wireless communication setting and show significant gains against state-of-theart conventional algorithms from combinatorics and optimization (e.g.

1. INTRODUCTION

User scheduling (i.e., which user to be served when) and associated resource allocation (i.e., which and how many resources should be assigned to scheduled users) are two long-standing fundamental problems in communications, which have recently attracted vivid attention in the context of next generation communication systems (5G and beyond). The main reason is the heterogeneity in users' traffic and the diverse Quality of Service (QoS) requirements required by the users. The goal of this paper is to design a scheduler and resource assigner, which takes as inputs the specific constraints of the traffic/service class each user belongs in order to maximize the number of satisfied users. This problem is hard to solve since we have at least two main technical challenges: (i) except for some special cases, there is no simple closed-form expression for the problem and a fortiori for its solution; (ii) the problem solving algorithm has to be scalable with the number of users. Current solutions rely on combinatorial approaches or suboptimal solutions, which seem to work satisfactorily in specific scenarios, failing though to perform well when the number of active users is large. This motivates the quest for alternative solutions; we propose to resort to Deep Reinforcement Learning (DRL) to tackle this problem. In the context of DRL, we propose to combine together several ingredients in order to solve the aforementioned challgening problem. In particular, we leverage on the theory of Deep Sets to design permutation equivariant and invariant models, which solves the scalability issue, i.e., the number of users can be increased without having to increase the number of parameters. We also stabilize the learning process by adding in a new way the distributional dimension marrying it with Dueling Networks to "center the losses". Finally, we compare the proposed DRL-based algorithm with conventional solutions based on combinatorial or suboptimal optimization approaches. Our experiments and simulation results clearly show that our DRL method significanlty outperforms conventional state-of-the-art algorithms.

