TEMPORALLY-WEIGHTED SPIKE ENCODING FOR EVENT-BASED OBJECT DETECTION AND CLASSIFI-CATION Anonymous

Abstract

Event-based cameras exhibit high dynamic range and temporal precision that could make them ideal for detecting objects with high speeds and low relative luminance. These properties have made event-based cameras especially interesting for use in space domain awareness tasks, such as detecting dim, artificial satellites with high brightness backgrounds using ground-based optical sensors; however, the asynchronous nature of event-based data presents new challenges to performing objection detection. While spiking neural networks (SNNs) have been shown to naturally complement the asynchronous and binary properties of event-based data, they also present a number of challenges in their training, such as the spike vanishing problem and the large number of timesteps required for maximizing classification and detection accuracy. Furthermore, the extremely high sampling rate of event-based sensors and the density of noisy space-based data collections can result in excessively large event streams within a short window of recording. We present a temporally-weighted spike encoding that greatly reduces the number of spikes derived from an event-based data stream, enabling the training of larger SNNs with fewer timesteps for maximal accuracy. We propose using this spike encoding with a variant of convolutional SNN trained utilizing surrogate spiking neuron gradients with backpropagation-through-time (BPTT) for both classification and object detection tasks with an emphasis on space-domain awareness. To demonstrate the efficacy of our encoding and SNN approach, we present competitive classification accuracies on benchmark datasets N-MNIST (99.7%), DVS-CIFAR10 (74.0%), and N-Caltech101 (72.8%), as well as state-of-the-art object detection performance on event-based, satellite collections.

1. INTRODUCTION

In recent years, the number of resident space objects (RSOs) in low-Earth orbit (LEO) and geosychronous-Earth orbit (GEO) has steadily grown, and consequently driven greater interest in the detection and tracking of such targets using ground-based optical telescopes. The tracking of RSOs, such as satellites or space debris, presents a unique challenge in that these targets often have very few distinguishing features from their surroundings and are difficult to image at high speeds. Furthermore, such targets are often far dimmer than ambient lighting, especially in both cis-lunar orbits and daytime viewing. These challenges motivate the need for new hardware sensors and computer vision techniques that can be easily integrated with existing ground-based detection schemes. Event-based cameras, or dynamic vision sensors, are one attractive technology that presents a solution to imaging RSOs. These cameras operate without a global clock, allowing each individual pixel to asynchronously emit events based on detected changes in illuminance at high frequency. Each pixel exhibits a logarithmic response to illuminance changes, resulting in such cameras having large dynamic range. Furthermore, since pixels respond only to changes in illuminance, the data produced is far sparser compared to that of a conventional sensor sampling at comparable rates. Of perhaps crucial importance for space-based detection tasks, the operation of event-based pixels also prevents them from saturating, which could prove incredibly useful for imaging near the Moon or in daylight These qualities suggest that event-based cameras could be ideal for the detection of dim, high-speed RSOs that generally are too challenging for conventional CCD sensors. However, the asynchronous nature of event-based data also poses a challenge to performing object detection effectively and efficiently. A naive approach to working with event-based data is to integrate over a pre-defined window of time to produce conventional images. Since each pixel generates events asynchronously, events are given both an (x, y) location as well as a timestamp t that corresponds to the time of event generation relative to a recorded starting time. Events are also given a polarity flag, p ∈ {1, -1}, that denotes the event was generated by an increase in illuminance (+1) or a decrease in illuminance (-1). For integration, these events, of the form e = (x, y, t, p), are accumulated over some window ∆t at their respective (x, y) locations to form an equivalent image. Such integrated frames can then be used with any conventional object detection method; however, this approach loses much of the temporal information present in the original event stream. Spiking neural networks (SNNs) differ from conventional neural networks in much the same way that event-based data differs from conventional images. SNNs function asynchronously, with each neuron of the network generating spikes only when its inputs cause it to exceed a pre-defined threshold, mimicking the function of biological neurons. The sparsity of SNN activation amounts to spiking networks being exceptionally energy efficient as compared to conventional neural networks of comparable size. However, the binary nature of spiking neuron output and the subsequent nondifferentiability of their activation makes supervised training of such networks a challenging task. Furthermore, SNNs are also plagued by the vanishing spike propagation issue, where decreasing spiking activity in successive layers causes significant performance loss in larger networks (Panda et al. ( 2020)). Nonetheless, the unique properties of SNNs naturally complement the data produced by event-based cameras, and multiple works have already shown the potential for classification and object detection on event-based data. In this work, we present a temporal-weight encoding that greatly decreases the number of spikes derived from event data stream while maintaining overall spiking behavior and preserving temporal information. This encoding scheme is also shown to reduce the number of timesteps required to maximize classification and object detection accuracy in spiking neural networks. We also propose a pseudo-spiking behavior for conventional, convolutional neural networks that removes the need for temporal credit assignment, but preserves some temporal information. This pseudo-spiking behavior is readily integrated with encoded, event-based data and enables the training of comparatively deeper models than true spiking networks. We evaluate detection results using both simulated and real space-based data collection and demonstrate competitive performance on publicly available eventbased classification and object detection datasets.

2. RELATED WORK

The following sections briefly explore some of the most notable works in each area touched upon in our own work.

2.1. SPACE DOMAIN AWARENESS

As previously mentioned, the detection of dim, high-speed RSOs is an already challenging task that is made even more difficult by conditions such as daylight, moonlight, and atmospheric turbulence. Traditionally, small targets are detected using specialized radar or laser equipment, though ground-based optics have become an attractive alternative due to their power efficiency and cost effectiveness. However, optical charge-coupled device (CCD) sensors often struggle with high amounts of background noise as well as with long exposure times that can complicate the detection of fast-moving objects (Kong et al. (2019) ). As an alternative optical device, event-based cameras could be ideal for replacing or complementing conventional CCD sensors for RSO detection. Recent work has already shown the use of event-based cameras for daytime imaging of LEOs (Cohen et al. ( 2019)), and simulated work has investigated star tracking using event-based data (Chin et al. ( 2019)). These successes, in addition to the successful application of object detection models such as YOLOv3 on space imaging datasets (Fletcher et al. ( 2019)), have motivated our work in investigating space object detection with event-based cameras. Furthermore, recent advances in space scene simulation have improved the ability to experiment with high fidelity, optical space collections. In this work, we make use of the SatSim simulator to generate the large number of samples necessary for model training (Cabello & Fletcher (2022) ).

