GEOVEX: GEOSPATIAL VECTORS WITH HEXAGONAL CONVOLUTIONAL AUTOENCODERS

Abstract

We introduce a new geospatial representation model called GeoVeX to learn global vectors for all geographical locations on Earth land cover. GeoVeX is built on a novel model architecture named Hexagonal Convolutional Autoencoders (HCAE) combined with a Zero-Inflated Poisson (ZIP) reconstruction layer, applied to a grid of Uber's H3 hexagons, each one described by the histogram of OpenStreetMap (OSM) geographical tags occurrences. GeoVeX is novel on two aspects: first, it produces pre-trained task-agnostic geospatial vectors with H3 and OSM that are, for the first time, contextualized on the neighboring hexagons features, by leveraging an hexagonal convolutional autoencoder applied on an H3/OSM grid centered on the location to embed; secondly, it introduces a zeroinflated Poisson autoencoder reconstruction layer, to adapt a standard autoencoder network to train on sparse geographical count data distributed on an hexagonal grid. Experiments demonstrate that GeoVeX embeddings improve upon two stateof-the-art geospatial location representations models, Hex2Vec and Space2Vec, on two different downstream tasks: worldwide listings price prediction in the travel industry, and hyperlocal interpolation of climate data from weather stations. The qualitative analysis of the latent representation structures learnt by GeoVeX showcases the higher quality of the geographical structures learnt by the geographically contextualized embeddings learnt by GeoVeX.

1. INTRODUCTION

Entity embedding is ubiquitous in a variety of Machine Learning tasks thanks to its many advantages: it captures the semantics of each entity in the context of a given domain; it enables transfer learning to different related tasks; it reduces the sparsity of the entity representation and compresses the feature space. In NLP domain, global word embedding models, such as Word2Vec (Mikolov et al., 2013) , GloVe (Pennington et al., 2014) and BERT (Devlin et al., 2019) have been successful at capturing the word semantics of big open-source vocabularies (e.g. Wikipedia, Gigaword) and are used to transfer learning to multiple downstream tasks, such as sentiment analysis (Tang et al., 2014; Deho et al., 2018; Alamoudi & Alghamdi, 2021 ), question retrieval (Zhou et al., 2015) , and medical semantics (Wang et al., 2018) . Similar approaches inspired by NLP have been since then proved to be useful in many industrial domains, where multiple models have been proposed for learning the latent representations of entities specific to an industry, such as Product2Vec (Biswas et al., 2017) and User2Vec (Hallac et al., 2019 ) in e-commerce, or Wave2Vec (Baevski et al., 2020) in speech representation, just to name a few. In comparison, in the field of Geographic Information Science (GIS), a global set of task-agnostic embeddings for geographical space representation can benefit multiple domains and use cases, such as: price prediction for houses (Wang et al., 2021 ), hotel rooms (Kisilevich et al., 2013) , and vacation homes (Islam et al., 2022; Pradip & Suthar, 2022) ; interpolation of climate variables such as temperature and pressure (Wu & Li, 2013) ; computer vision tasks with geo-located images (Berg et al., 2014) . These tasks, just to name a few, have in common the application of some transformations to the spatial coordinates, but they do not leverage the spatial distribution of geo entities (such as parks, water, beach, buildings, streets, bars, etc.), which convey a more rich information of the geographical context. Besides, in terms of modelling, previous approaches to learn geospatial embeddings have a set of limitations, such as being non-contextual, task-specific and/or region-specific (Sec. 2) that we address with a novel model architecture and loss function formulation (Sec. 3.6). Our new approach to learn the geospatial embedding of each location on Earth, based on nearby geographical entities, aims to pre-train a finite set of embeddings that can cover the whole Earth, so that they can be stored and used with ease as extra features in any Machine Learning task where entities have latitude and longitude coordinates. The general workflow is summarized in Fig. 1 . To achieve the goal of wide adoption to multiple downstream tasks, we leverage Uber Hexagonal Hierarchical Spatial Index grid system named H3 1 to spatially index the data coordinates into small regions of approximately the same size, since H3 minimizes map distortion. A pair of coordinates (i, j) is thus represented by a unique H3 hexagonal id for which we learn a GeoVeX embedding. To learn GeoVeX embeddings that have a geographical semantic, we associate each H3 hexagon to the geographical tags of the entities obtained from OpenStreetMap (OSM)foot_0 . OSM is a project that creates and distributes free worldwide geographic data, and, as of January 2022, has ≈7 billion nodes and ≈4 million map changes per day. This makes it the equivalent of Wikipedia for word embeddings: a massive, scalable, and information-rich global open dataset for creating and updating global embeddings. In particular, OSM contains nodes, ways and relations, which together can be transformed to points, lines and polygons, each one characterized by a set of semantic tags, such as amenity:bar, highway:motorway, natural:forest. By intersecting these OSM entities with the H3 hexagons, and by using a Bag-Of-Words (BOW) model on their tags, each hexagon can be sparsely described by a K-dimensional histogram vector, where K is the size of a subset of the vocabulary of OSM geographical tags, where each element represents the number of times an entity with the respective tag is contained or intersects the hexagon itself. This information needs then to be properly aggregated to produce an embedding while, at the same time, taking into account the information from neighboring hexagons, which provide the geographical context. This concept follows the first law of geography: everything is related to everything else, but near things are more related than distant things (Tobler, 1970) . The convolution operation, borrowed from Computer Vision, presents in this domain challenges to address: 1) not square, but hexagonal grids, and 2) different distribution of each "channel": not dense pixel values, but highly sparse counts. GeoVeX model aims to bridge the gap of convolutional neural networks usage on these hexagonal grids described by sparse counts. In summary, the contributions of our work are: 1. the GeoVeX architecture design to learn task-agnostic pre-trained location embeddings with H3 and OSM that are for the first time contextualized on the neighboring hexagons. We demonstrate their expressive power qualitatively, by using an analysis of the cosine similarities (Sec. 4.1), and quantitatively, by adding the embeddings to the feature set of two downstream tasks: price prediction in the worldwide travel industry (Sec. 4.2) and temperature interpolation of climate data from weather stations (Sec. A.6);



the novel Zero-Inflated Poisson (ZIP) autencoder's probabilistic decoder block, which is trained with a spatial contextual loss function, to adapt the standard reconstruction layer of autoencoders network to the case of zero-inflated spatial contextual count data produced by the H3 grid and the OSM entity tags counts (Sec. 3.6).1 https://eng.uber.com/h3 2 https://www.openstreetmap.org/



Figure 1: Workflow to use GeoVeX embeddings in downstream tasks: the task features are expanded by simply concatenating the GeoVeX embedding associated to the H3 hexagon corresponding to the latitude (lat) and longitude (lng) coordinates of each task item, without retraining the embeddings.

