CAUSAL PROBABILISTIC SPATIO-TEMPORAL FUSION TRANSFORMERS IN TWO-SIDED RIDE-HAILING MARKETS

Abstract

Achieving accurate spatio-temporal predictions in large-scale systems is extremely valuable in many real-world applications, such as weather forecasts, retail forecasting, and urban traffic forecasting. So far, most existing methods for multi-horizon, multi-task and multitarget predictions select important predicting variables via their correlations with responses of interest, and thus it is highly possible that many forecasting models generated from those methods are not causal, leading to poor interpretability. The aim of this paper is to develop a collaborative causal spatio-temporal fusion transformer, named CausalTrans, to establish the collaborative causal effects of predictors on multiple forecasting targets, such as supply and demand in ride-sharing platforms. Specifically, we integrate the causal attention with the Conditional Average Treatment Effect (CATE) estimation method in causal inference. Moreover, we propose a novel and fast multi-head attention evolved from Taylor's expansion instead of softmax, reducing time complexity from O(V 2 ) to O(V), where V is the number of nodes in a graph. We further design a spatial graph fusion mechanism to significantly reduce the parameters' scale. We conduct a wide range of experiments to demonstrate the interpretability of causal attention, the effectiveness of various model components, and the time efficiency of our CausalTrans. As shown in these experiments, our CausalTrans framework can achieve up to 15% error reduction compared with various baseline methods.

1. INTRODUCTION

This paper is motivated by solving a collaborative probabilistic forecasting problem of both supply and demand in two-sided ride-hailing platforms, such as Uber and DiDi. Collaborative supply and demand relationships are common in various two-sided markets, such as Amazon, Airbnb, and eBay. We consider two-sided ride-hailing platforms as an example. In this case, we denote supply and demand as online driver number and call orders, respectively, on the platform at a specific time in a city. Some major factors for demand include rush hours, weekdays, weather conditions, transportation network, points of interest, and holidays. For instance, if it rains during peak hours in weekdays, demand will dramatically increase and last for a certain time period. In contrast, some major factors for supply include weather, holidays, traffic condition, weekdays, and platform's dispatching and repositioning policies. Moreover, supply tends to gradually cover the area with many unsatisfied orders, that is, the distribution of supply tends to match with that of demand. We are interested in establishing collaborative causal forecasting models for demand and supply by using various predictors (or covariates). Although many learning methods have been developed to address various collaborative prediction tasks, such as spatio-temporal traffic flow prediction (Zhu & Laptev, 2017; Du et al., 2018; Zhang et al., 2019b; Ermagun & Levinson, 2018; Luo et al., 2019) , multivariate prediction (Bahadori et al., 2014; Liang et al., 2018 ), multi-task prediction (Tang et al., 2018; Chen et al., 2018; Chandra et al., 2017 ), multi-view prediction (Yao et al., 2018 ), and multi-horizon prediction (Lim et al., 2019; Yu et al., 2020) , these existing methods primarily select important predictors via their correlations with responses, leading to many forecasting models with poor interpretability. In contrast, we propose CausalTrans: a Collaborative Spatio-temporal Fusion Transformer, that generates causal probabilistic multi-horizon forecasts. To the best of our knowledge, this is the first work that captures collaborative causal effects of external covariates on multiple forecasting targets. Building such models is not only essential to enhancing forecasting performance, but also helps the platform to utilize various platform policies to match the distribution of supply with that of demand in two-sided markets. In the CausalTrans framework, our major contributions are summarized as follows: • We design the causal attention based on double machine learning (Chernozhukov et al., 2018) with two layers fully connected neural networks, and successful apply it to various large-scale time series forecasting problems. We conduct a wide range of experiments on real world datasets with multiple covariates and demonstrate that CausalTrans with causal attention outperforms many baseline models in various Ride-hailing scenarios. • We propose a spatial fusion mechanism based on graph attention networks (GAT) (Veličković et al., 2017) to gather local regions and enhance robustness as adjacent regions always share similar supply and demand patterns. • We propose an approximate time-efficient Taylor expansion attention to replace softmax in multihead attention of Transformers (Vaswani et al., 2017) such that time complexity reduces from O(V 2 ) to O(V). We carry out two groups of experiments with three multi-heads and five multi-heads to verify such efficiency improvement.

2. RELATED WORK

There is a large body of literature on vehicle flow forecasting (Zhu & Laptev, 2017; Bahadori et al., 2014; Tang et al., 2018; Lim et al., 2019; Yao et al., 2018) . We selectively review several major methods as follows. In Zhu & Laptev (2017), the time series forecasting task as a two-step procedure includes offline pre-training and online forecasting. 



The offline pre-training step is an encoder-decoder framework for compressing sequential features and extracting principal components, whereas the second step gives explainable prediction changes under external variables.Bahadori et al. (2014)  proposed a unified low-rank tensor learning framework for multivariate spatio-temporal analysis by combining various attributes of spatio-temporal data including spatial clustering and shared variables structure. For multi-step traffic flow prediction,Tang et al. (2018)  proposed a spatio-temporal multi-task collaborative learning model to extract and learn shared information among multiple prediction tasks collaboratively. For example, such model combines spatial features collected from offline observation stations and inherent information between blended time granularities.Lim et al. (2019)   proposed a temporal fusion transformer (TFT) to capture temporal correlations at each position, which was similar to self-attention mechanism and expected to capture long-term and short-term dependencies. Yao et al. (2018) proposed a deep multi-view spatio-temporal network (DMVST-Net), including a speed viewpoint (modeling the correlation between historical and future demand by LSTM (Gers & Schmidhuber, 2001)), a spatial viewpoint (modeling local spatial correlation by CNN), and a contextual viewpoint (modeling regional correlations in local temporal patterns). Overall, all above methods improve time series fitting by learning and predicting correlations across multiple spatio-temporal perspectives, targets, and tasks. However, those methods lack convincing interpretability of "how and to what extent external variables affect supply and demand". Achieving good demand forecasting involves not only historical demand targets, but also various current external variables (e.g., weather conditions, traffic conditions, holidays, and driver reposition). Those historical demand observations were affected by historical external factors, so the demand forecasting only based on correlation between variables is hardly convincing. Furthermore, supply forecasting

