FLOW NEURAL NETWORK FOR TRAFFIC FLOW MOD-ELLING IN IP NETWORKS

Abstract

This paper presents and investigates a novel and timely application domain for deep learning: sub-second traffic flow modelling in IP networks. Traffic flows are the most fundamental components in an IP based networking system. The accurate modelling of the generative patterns of these flows is crucial for many practical network applications. However, the high nonlinearity and dynamics of both the traffic and network conditions make this task challenging, particularly at the time granularity of sub-second. In this paper, we cast this problem as a representation learning task to model the intricate patterns in data traffic according to the IP network structure and working mechanism. Accordingly, we propose a customized Flow Neural Network, which works in a self-supervised way to extract the domain-specific data correlations. We report the state-of-the-art performances on both synthetic and realistic traffic patterns on multiple practical network applications, which provides a good testament to the strength of our approach.

1. INTRODUCTION

Deep Learning (DL) has gained substantial popularity in light of its applicability to real-world tasks across computer vision, natural language processing (Goodfellow et al., 2016) , protein structure prediction (Senior et al., 2020) and challenging games such as Go (Silver et al., 2017) . Typically, the data for these learning tasks takes the form of either grids, sequences, graphs or their combinations. The tremendous efforts on customizing neural network structures (Krizhevsky et al., 2012; Kiros et al., 2015; Hochreiter & Schmidhuber, 1997) and learning strategies (Sermanet et al., 2018; Oord et al., 2019) to explore the data-specific properties underpin the success of modern DL in these domains. Following the same design philosophy, we wish to capitalize on these advancements to develop a customized neural network and self-supervised learning strategy to tackle the crucial and timely challenge of traffic flow modelling in IP networks.

1.1. TRAFFIC FLOW MODELLING IN IP NETWORKS

An IP network is a communication network that uses Internet Protocol (IP) to send and receive messages between one or more devices such as computers, mobile phones. The messages could be general application data such as video, emails or control signals of any connected devices. When sending the messages from a source to a destination, the source device encapsulates the bit chunks of encoded messages into a set of IP packets. The packets then travel through communications links and routers or switches in a given routing path sequentially, thus forming the traffic flows in an IP network (Hunt, 1992) . As one of the most commonly used global networks, the IP network provides the majority of such data transmission services to support today's Internet applications such as video streaming, voice-over-IP, and Internet of Things. Therefore, a good understanding of the behaviorial patterns of the underlying traffic flows plays a crucial role in network planning, traffic management, as well as optimizing Quality of Service (QoS, e.g., transmission rate, delay). This challenge is termed as traffic flow modelling and is fundamental to IP networking research and practice. However, the high nonlinearity, randomness and complicated self similarity (Leland et al., 1994) of these traffic thwart extensive traditional analytical and learning models, particularly at fine-grained time scales, such as traffic flow modelling at a sub-second level. Consider the illustrative example in Fig. 1 , which depicts multiple packet flows with shared forwarding nodes and links in their routing paths. The sender of each flows streams data packets to the receiver at a dynamic sending rate, which is determined according to many factors such as its rate demand, existing traffic loads, available link bandwidth, and etc. The packets usually experience various delays on the journey due to actions such as forwarding processing, link transmission, packet queueing. For example, when the sum rate of Sender 2 and 3 exceeds 10 Gbps, the router R2-R4 will hold off and cache the arriving packets in their buffers until the links from R2 to Receiver 1 become free, causing what is known as the queueing delay. The extent of these delays depends on multiple factors, including the amount of traffic going on, the capacity of the router's output queue, link bandwidth etc. The random establishment, interaction and termination of massive flow connections give rise to network dynamics. This illustrates the complexity of traffic flow modelling in IP network even for the simple example. This challenge is exacerbated when the traffic loads are running at over 100 Gbps and in a network with significantly larger size in practice.

1.2. MOTIVATING FLOWNN BASED TRAFFIC FLOW MODELLING

A flow pattern can be defined as anything that follows a trend and exhibits some kind of regularity, e.g., distribution, periodicity etc. The modelling of traffic flow patterns can be done mathematically or by the use of data-driven learning algorithms. We argue that developing a customized FlowNN in the context of IP traffic flow modelling is important in two aspects: 1) improving the performances of supported network applications from the accurate modelling towards the behavioral patterns of traffic flows in IP network, particularly at the time scale of sub-second level; 2) providing an exciting new "playground" and neural network model for the DL community to solve real-world-motivated research challenges by deeply combining its structure and working mechanisms. Next, we make the following two clarifications. Why not using traditional mathematical models. The past decades have seen numerous traffic models proposed to mathematically model the traffic characteristics of networks (Gebali, 2015) . For example, extensive studies use the Poisson model to characterize the traffic by assuming the arrival pattern between two successive packets follows Poisson process. Considering the heavy tailed distribution and burstiness of the data-center traffic, recent work in Benson et al. (2010) models the traffic arrival pattern as a log-normal process. To capture the temporal patterns and make predictions accordingly, Seasonal Autoregressive Integrated Moving Average (SARIMA) is exploited in (Ergenc & Ertan, 2019) to model the traffic time series. These analytical models may generate outputs that are easier to interpret, but are bonded to the specific working circumstance and assumptions. More importantly, these statistical models function at coarse time scales of hours and assume relatively smoother traffic patterns. However, as reported in many practical traffic measurements in e.g. Benson et al. (2010; 2011); Greenberg et al. (2009) , most flows last less than 1 minute. This implicates tasks requiring traffic models at finer-grained time scales are beyond the capability of these traditional models. Fig. 2 plots the traffic traces we collected from a practical backbone network-WIDEfoot_0 , which shows the realistic traffic patterns when the packet flows are sampled by two different time scales. The long time-scale plot in Fig. 2b shows clear a "tide-effect" associated with daily human activities. By contrast, the traffic traces in Fig. 2a get more noisy and difficult to recognize obvious patterns when they are counted by millisecond.



http://mawi.wide.ad.jp/ ˜agurim/index.html



Figure 1: Traffic flows in IP networks.

