PREDICTION OF TOURISM FLOW WITH SPARSE DATA INCORPORATING TOURIST GEOLOCATIONS Anonymous

Abstract

Modern tourism in the 21st century is facing numerous challenges. One of these challenges is the rapidly growing number of tourists in space-limited regions such as historical city centers, museums, or geographical bottlenecks like narrow valleys. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks such as sustainable treatment of the environment and prevention of overcrowding. Static flow control methods like conventional low-level controllers or limiting access to overcrowded venues could not solve the problem yet. In this paper, we empirically evaluate the performance of state-of-the-art deep-learning methods such as RNNs, GNNs, and Transformers as well as the classic statistical ARIMA method. Granular limited data supplied by a tourism region is extended by exogenous data such as geolocation trajectories of individual tourists, weather and holidays. In the field of visitor flow prediction with sparse data, we are thereby capable of increasing the accuracy of our predictions, incorporating modern input feature handling as well as mapping geolocation data on top of discrete POI data.

1. INTRODUCTION

With increasing population and travel capacities (e.g. easy access to international flights) cultural tourism destinations have seen a rise in visitor counts. In addition, recent needs for social distancing and attendance limitations due to the global COVID-19 pandemic have confronted tourism destinations with significant challenges in e.g. creating and establishing sustainable treatment of the both urbanised and natural environment or e.g. preventing overcrowded waiting-lines. The perceptions of tourists regarding health hazards, safety and unpleasant tourism experiences may be influenced by social distance and better physical separation Sigala (2020). As far as The United Nation's 2030 Agenda for Sustainable Development UNWTO (2015) is concerned, tourism has not only the potential to contribute to several of the 17 Sustainable Development Goals (SDGs), but moreover an obligation. Only by establishing sustainable tourism it will be possible to create • sustainable cities and communities (Goal 11) • responsible consumption and production (Goal 12) • decent work and economic growth (Goal 8) Therefore, future-oriented tourism regions aim to first understand and then control visitor flows in order to • preserve and protect their natural landmarks • reduce emissions and waste as a result of overcrowding e.g. in parks or narrow city centers • establish sustainable energy consumption within tourist attractions • create harmony between residents and tourists • and maximise tourist satisfaction, which is directly connected to the economical wealth of the specific tourism region. Unfortunately, many real-world problems suffer from sparse data availability due to data compliance issues, lack of data collection or even lack of data transfer through stakeholders. In the end there are not enough datasets to properly train state-of-the art machine learning models. On the other hand there are datasets available, where ethical considerations have to be made on whether they full-fill data privacy policies as well as comply the rights of tourists. In our research we use non personal data collected by POIs, tourist or tourist related facilities as well as anonymized digital device data. Although tourists agree on e.g. sharing entry timestamps or locations, the collection of this data is often a side product of services such as ticket sales, travelcards or digital apps. The latter is the most controversial dataset used in this research, since it is location data collected by Mobilephone-Apps. The dataset can be compared to Mobilephone-data collected from mobile network operators, which displays locations of devices throughout a specific time-period. This common practice collection of data is entirely profit-oriented, since the companies collecting this data specifically aim to sell it. Apart from the fact that such data can help to improve scientific research and overcome real-life problems, it has to be discussed whether people are aware of what they are sharing by using these services, even if these datasets do not contain direct personal data. The question on how to improve awareness of data shared by such apps or services is not answered in this research. This scientific work is focusing on what is possible to achieve in the given environment considering the given data and data history in regards to tourist flow prediction, since sparse data is a wide spread generic problem. The first step in order to control tourist flows is to predict authentic movement and behavior patterns. However, since the tourist visitor flow is affected by many factors such as the weather, cultural events, holidays, and regional traffic and hotspots throughout a specific day, it is a very challenging task to accurately predict the future flow Liu et al. In this work, we focus on tourist flow prediction based on a local dataset from the visitors of the tourist attractions of the city of Salzburg as well as third-party geolocation data of individual tourists. After data preprocessing and dataset preparation, we attempt to compare the performance of different deep-learning based methods for time-series prediction with ARIMA, a traditional statistics based method. According to Li and Cao Li & Cao (2018) , ARIMA is the most popular classical time forecasting method based on exponential smoothing and it was made popular in 1970s when it was proposed by Ahmed and Cook Ahmed & Cook (1979) to be used for short-term freeway traffic predictions. Deep neural networks are proven to work very well on large datasets. However, their performance can degrade when trained on limited data, resulting in poor predictions on the test set. Since limited data is a common problem in tourism time-series forecasting, we perform a comprehensive comparison of the DNNs and traditional techniques on a small dataset to reveal the shortcomings and point out necessary future improvements. 



(2018). Due to the availability of large datasets and computational resources, deep neural networks became the state-of-the-art methods in the task of forecasting time-series data Pan et al. (2021), including tourism flow applications Prilistya et al. (2020).

based models. Recurrent Neural Networks (RNNs) are the state-of-the-art models for learning time-series datasets. RNNs equip the neural networks with memory, making them successful at predicting the sequence-based data. The introduction of gating mechanism to RNNs lead to the great performance of LSTM Hochreiter & Schmidhuber (1997) and GRU Chung et al. (2014). of continuous-time networks are NeuralODEs Chen et al. (2018) that define the hidden state of the network as a solution to an ordinary differential equation. Some limitations of NeuralODEs such as non-intersecting trajectories can be aleviated by using augmentations strategies leading to Augmented-NeuralODEs (ANODEs)Dupont et al. (2019). Continuous-time models share some favourable properties: Adaptive computation as they can be implemented by numerical ODE (ordinary differential equations) solvers and training with constant memory cost by using the adjoint sensitivity method Chen et al.(2018). In addition, they can be statistically verified

