TRANSCENDENTAL IDEALISM OF PLANNER: EVALUATING PERCEPTION FROM THE PLANNING PERSPECTIVE FOR AUTONOMOUS DRIVING Anonymous authors Paper under double-blind review

Abstract

Evaluating the performance of perception module in autonomous driving is one of the most critical tasks in developing these complex intelligent systems. While module-level unit test methodologies adopted from traditional computer vision tasks are viable to a certain extent, it still remains far less explored to evaluate how changes in a perception module can impact the planning of an autonomous vehicle in a consistent and holistic manner. In this work, we propose a principled framework that provides a coherent and systematic understanding of how perception modules affect the planning of an autonomous vehicle that actually controls the vehicle. Specifically, planning of an autonomous vehicle is formulated as an expected utility maximisation problem, where all input signals from upstream modules jointly provide a world state description, and the planner aims to find the optimal action to execute by finding the solution to maximise the expected utility determined by both the world state and the action. We show that, under some mild conditions, the objective function can be represented as an inner product between the world state description and the utility function in a Hilbert space. This geometric interpretation enables a novel way to analyse the impact of noise in world state estimation on the solution to the problem, and leads to a universal quantitative metric for such purpose. The whole framework resembles the idea of transcendental idealism in the classical philosophy literature, which gives the name to our approach.

1. INTRODUCTION

Autonomous driving has recently risen as a fast-advancing realm in both industry and academia, and receives a surge of interest from engineering and scientific communities (Yurtsever et al., 2020; Sun et al., 2020) . As an intricate system, an autonomous driving vehicle consists of numerous hardware components and interactive onboard modules. As one such core component, the onboard perception module serves as the major source of real-time characterisation of the dynamic environment an autonomous vehicle (AV) navigates through. To evaluate and improve the perception module, conventional perception tasks (such as detection, segmentation, tracking) have been well defined and corresponding performance measurements are established in computer vision to benchmark performance of perception algorithms (Lin et al., 2014) . Despite their great success in driving the development of advanced perceptual information processing modules, almost all such metrics exclusively focus on the perception-level performance in a deployment-agnostic fashion, for instance, how close a detected object is to the ground truth, while ignoring the actual impact of the result to the entire AV system. Indeed, not all perception errors translate the same to the planning of an AV. Obviously, miss detecting a vehicle in front of an AV is far more serious than one behind far away. This problem is further compounded by the heterogeneity of perception errors that share little semantics in common ("How dose an error of 5m/s in velocity compare to that of a size 25% larger?"), where intuitive manual engineering is widely used (Caesar et al., 2020) . Although these issues are typically addressed by integrating road test in the real world, the process is extremely costly and time-consuming (Wachenfeld and Winner, 2016; Åsljung et al., 2017) . In result, tools are in great demand to effectively and efficiently measure the impact of perception to the whole autonomous driving system before deployment on road. Unfortunately, these solutions still remain far less explored in the research literature. The change in AV behaviour due to perception error is not always correlated to the cost of consequence. In (a) the AV has to circumvent the erroneously perceived cone by making a large detour. While for (b) the AV only needs to make a slight detour to the right, yet it inevitably hits the cone. In this case, although the behaviour change is far less than that of (a), the consequence is significantly worse ("hitting an object" v.s. "making a large detour"). In (c) the consequence of either way is indifferent to the AV in moving forward, yet the change in behaviour is considerable in terms of spatiotemporal motion. As for (d), if there are two falsely detected cones on both sides, which are close enough to the AV when passing by despite no collision, the AV still decides to maintain the same motion as in the ground truth case. Therefore, the final behaviour of the AV does not change given the perception error, but the cost of passing by two close objects already changes the planning process, which will be missed by the metrics that only look at the AV behaviour or planning result. Most recently, the community starts to approach this problem with some initial efforts (Sun et al., 2020; Philion et al., 2020; Ivanovic and Pavone, 2021; Deng et al., 2021) . Despite some success, these preliminary solutions only exploit certain aspects of the problem, either implicitly relying on weak correlation between behaviour change and driving cost (Philion et al., 2020) , inferring the holistic cost via local properties (Ivanovic and Pavone, 2021), or coarse levels (Sun et al., 2020) . In this work, we propose a principled and universal framework to quantify how noise in perception input affects the AV planning. This is achieved by explicitly analysing the process of AV planning, in the context of expected utility maximisation (Osborne and Rubinstein, 1994) , and evaluating the change of utility values critical to the AV reasoning subject to input perception errors. Under some mild conditions (Section 3.3), we show that this planning process can be formulated as an optimisation problem with linear objective function in a Hilbert space, where utility to optimise is the inner product of an action-wise utility function and the world state distribution represented by perception. This geometric interpretation reveals many natural and insightful properties of the problem, for example, any input error can be decomposed into two parts: one that does not affect the utility comparison (planning-invariant error) and the other one that directly changes the planning problem (planning-critical error). Based on this novel insight, we derive a metric that quantify how a perception error changes the planning process. We want to emphasise the necessity of understanding impacts of perception errors on an autonomous driving system via the process of planning, rather than purely from the final result (i.e., the AV behaviour, or the trajectory output from the planning module), as proposed by previous works (Philion et al., 2020) . This results from the fact that, the final planning result does not necessarily reflect how AVs evaluate the situation, reason with the environment, and assess the costs of actions. In fact, the correlation between behaviour change and the actual consequence is weak, or even negative in many common cases, as illustrated in Figure 1 . Actually, most works implicitly or explicitly integrate some priori knowledge of consequences of perception errors into metric design. The complexity of such impact on autonomous driving, however, is far beyond hand-crafted rules, defeating their purposes despite tremendous amounts of manual efforts, e.g., Deng et al. (2021) assumes that severity of an error should be weighted proportional to the reciprocal of its cubed Manhattan distance to the AV, regardless of its position relative to the AV (in front or behind the AV). In contrast, we make little such presumption and fully rely on the planning process to infer the error consequence in a fully transparent way, which enables our solution to capture many critical cases. In this regard, the core principle of our design resembles the idea in the philosophical system of transcendental idealism, proposed by Immanuel Kant in his classical work Critique of Pure Reason (Kant, 1998) , which argues



Figure 1: Illustration of behaviour change v.s. driving cost (best viewed colour).The change in AV behaviour due to perception error is not always correlated to the cost of consequence. In (a) the AV has to circumvent the erroneously perceived cone by making a large detour. While for (b) the AV only needs to make a slight detour to the right, yet it inevitably hits the cone. In this case, although the behaviour change is far less than that of (a), the consequence is significantly worse ("hitting an object" v.s. "making a large detour"). In (c) the consequence of either way is indifferent to the AV in moving forward, yet the change in behaviour is considerable in terms of spatiotemporal motion. As for (d), if there are two falsely detected cones on both sides, which are close enough to the AV when passing by despite no collision, the AV still decides to maintain the same motion as in the ground truth case. Therefore, the final behaviour of the AV does not change given the perception error, but the cost of passing by two close objects already changes the planning process, which will be missed by the metrics that only look at the AV behaviour or planning result.

