MASTERING SPATIAL GRAPH PREDICTION OF ROAD NETWORKS

Abstract

Figure 1 : Our agent interacts with the currently generated spatial graph by proposing new edges to be added. A tree-based search produces a sequence of actions that maximizes a reward function based on complex geometrics priors.

1. INTRODUCTION

Road layout modelling from satellite images constitutes an important task of remote sensing, with numerous applications in and navigation. The vast amounts of data available from the commercialization of geospatial data, in addition to the need for accurately establishing the connectivity of roads in remote areas, have led to an increased interest in the precise representation of existing road networks. By nature, these applications require structured data types that provide efficient representations to encode geometry, in this case, graphs, a de facto choice in domains such as computer graphics, virtual reality, gaming, and the film industry. These structured-graph representations are also commonly used to label recent road network datasets (Van Etten et al., 2018) and map repositories (OpenStreetMap contributors, 2017) . Based on these observations, we propose a new method for generating predictions directly as spatial graphs, allowing us to explicitly incorporate geometric constraints in the learning process, encouraging predictions that better capture higher-level dataset statistics. In contrast, existing methods for road layout detection, mostly rely on pixel-based segmentation models that are trained on masks produced by rasterizing ground truth graphs. Performing pixelwise segmentation, though, ignores structural features and geometric constraints inherent to the problem. As a result, minimum differences in the pixel-level output domain can have significant consequences in the proposed graph, in terms of connectivity and path distances, as manifested by the often fragmented outputs obtained after running inference on these models. In order to address these significant drawbacks, we propose a new paradigm where we: (i) directly generate outputs as spatial graphs and (ii) formalize the problem as a game where we sequentially construct the output by adding edges between key points. These key points can in principle come from any off-the-shelf detector that identifies road pieces with sufficient accuracy. Our generation process avoids having to resort to cumbersome post-processing steps (Batra et al., 2019; Montoya-Zegarra et al., 2015) or optimize some surrogate objectives (Máttyus & Urtasun, 2018; Mosinska et al., 2018) whose relation to the desired qualities of the final prediction is disputed. Concurrently, the sequential decision-making strategy we propose enables us to focus interactively on different parts of the image, introducing the notion of a current state and producing reward for a succession of actions. In essence, our method can be considered as a generalization of previous refinement techniques (Batra et al., 2019; Li et al., 2019b) with three major advantages: (i) removal of the requirement for greedy decoding, (ii) ability to attend globally to the current prediction and selectively target parts of the image, and (iii) capacity to train based on demanding task-specific metrics. More precisely, our contributions are the following: • We propose a novel generic strategy for training and inference in autoregressive models that removes the requirement of decoding according to a pre-defined order and refines initial sampling probabilities via a tree search. • We create a new synthetic benchmark dataset of pixel-level accurate labels of overhead satellite images for the task of road network extraction. This gives us the ability to simulate complex scenarios with occluded regions, allowing us to demonstrate the improved robustness of our approach. We plan to release this dataset publicly. • We confirm the wide applicability of our approach by improving the performance of existing methods on the popular SpaceNet and DeepGlobe datasets. Initial attempts to extract road networks mainly revolved around handcrafted features and stochastic geometric models of roads (Barzohar & Cooper, 1996) . Road layouts have specific characteristics, regarding radiometry and topology e.g. particular junction distribution, certain general orientation, and curvature (see Fig. 2 ), that enable their detection even in cases with significant occlusion and uncertainty (Hinz & Baumgartner, 2003) . Modern approaches mostly formulate the road extraction task as a segmentation prediction task (Lian et al., 2020; Mattyus et al., 2015; Audebert et al., 2017) by applying models such as Hourglass (Newell et al., 2016) or LinkNet (Chaurasia & Culurciello, 2017) . This interpretation has significant drawbacks when evaluated against structural losses, because of discontinuities in the predicted masks. Such shortcomings have been addressed by applying some additional post-processing steps, such as high-order conditional random fields (Niemeyer et al., 2011; Wegner et al., 2013) or by training additional models that refine these initial predictions (Máttyus et al., 2017; Batra et al., 2019) . 



Figure 2: Typical road network features (SpaceNet dataset). From left to right: (a) the distribution of angles between road segments leading to the same intersection is biased towards 0 and 90 degrees, parallel or perpendicular roads. The same also holds (b) for angles between random road pieces within a ground distance of 400 meters. (c) Most road vertices belong to a single road piece, with a degree of 2. (d) The average number of intersections for areas of 400×400 meters by ground distance.

Other common techniques include the optimization of an ensemble of losses. Chu et al. (2019) rely on a directional loss and use non-maximal suppression as a thinning layer, while Batra et al. (2019) calculate orientations of road segments. Although such auxiliary losses somewhat improve the output consistency, the fundamental issue of producing

