CCIL: CONTEXT-CONDITIONED IMITATION LEARNING FOR URBAN DRIVING

Abstract

Imitation learning is a promising solution to the challenging autonomous urban driving task as experienced human drivers can effortlessly tackle highly complex driving scenarios. Behavior cloning is the most widely applied imitation learning approach in autonomous driving due to its exemption from potentially risky online interactions, but it suffers from the covariate shift issue. To mitigate this problem, we propose a context-conditioned imitation learning approach that learns a policy to map the context state into the ego vehicle's state instead of the typical formulation from both ego and context state to the ego action. Besides, to make full use of the spatial and temporal relations in the context to infer the ego future states, we design a novel policy network based on the Transformer, whose attention mechanism has demonstrated excellent performance in capturing relations. Finally, during evaluation, a linear quadratic controller is employed to produce smooth planning based on the predicted states from the policy network. Experiments on the real-world large-scale Lyft and nuPlan datasets demonstrate that our method can surpass the state-of-the-art method significantly.

1. INTRODUCTION

Planning a safe, comfortable, and efficient trajectory in a complex urban environment for a selfdriving vehicle (SDV) is an important and challenging task in autonomous driving (Yurtsever et al., 2020) . Unlike highway driving (Henaff et al., 2019) , urban driving requires handling more varied road geometry such as roundabouts and intersections while interacting with traffic lights, pedestrians, and other vehicles. Classic manually-designed rule-based approaches (Fan et al., 2018) have achieved some success in industry but demand tedious human engineering to struggle with diverse real-world cases. Meanwhile, the rapid development of deep learning techniques motivates researchers (Bojarski et al., 2016; Pan et al., 2020) to employ a deep neural network to model the complicated driving policy. To learn such a policy, imitation learning (IL) from human drivers' demonstrations is a promising solution as experienced drivers can tackle even extremely challenging situations, and their driving data can be collected at scale. The simplest IL algorithm is the behavior cloning (BC) method, which has wide applications in autonomous driving (Pomerleau, 1988; Bojarski et al., 2016; Codevilla et al., 2018b) due to its exemption from potentially dangerous online interactions. It learns a policy in a supervised fashion by minimizing the difference between the learner's action and the expert's action in the expert state distribution. However, the BC method suffers from the covariate shift issue (Ross et al., 2011) , i.e. the state induced by the learner's policy cumulatively deviates from the expert's distribution. To overcome the covariate shift obstacle, existing methods such as DAgger (Ross et al., 2011) and DART (Laskey et al., 2017) query supervisor corrections at the learner's states or perturbed expert's states. Since human supervision is hard to collect, recent works like GAIL (Ho & Ermon, 2016) seek to provide feedback from a neural network-based discriminator to recover from out-ofdistribution states generated by the learner's policy. However, these data augmentation methods need either expert supervision or rolling out their policy in the real world or a realistic simulator, which are impractical in autonomous driving. Instead, some researchers attempt to constrain the learned policy formulation to ensure its robustness to the policy error by incorporating control theoretic prior knowledge. For example, Palan et al. (2020); Havens & Hu (2021) pose Kalman or linear matrix inequality constraints on the learned linear policy to guarantee its closed-loop stability in a linear 1

