ONLINE LEARNING FOR OBSTACLE AVOIDANCE

Abstract

We approach the fundamental problem of obstacle avoidance for robotic systems via the lens of online learning. In contrast to prior work that either assumes worstcase realization of uncertainty in the environment or a given stochastic model of uncertainty, we propose a method that is efficient to implement and provably grants instance-optimality to perturbations of trajectories generated from an open-loop planner in the sense of minimizing worst-case regret. The resulting policy thus adapts online to realizations of uncertainty and provably compares well with the best obstacle avoidance policy in hindsight from a rich class of policies. The method is validated in simulation on a dynamical system environment and compared to baseline open-loop planning and robust Hamilton-Jacobi reachability techniques.

1. INTRODUCTION

The problem of obstacle avoidance in motion planning is a fundamental and challenging task at the core of robotics and robot safety. Successfully solving the problem requires dealing with environments that are inherently uncertain and noisy: a robot must take into account uncertainty in its own dynamics, e.g., due to external disturbances or unmodeled effects, and the dynamics of other agents in the environment, e.g., humans or other robots. Approaches for tackling the obstacle avoidance problem in robotics typically fall under two categories: (i) methods that attempt to construct stochastic models of uncertainty in the dynamics of the robot and other agents and use the resulting probabilistic models for planning, and (ii) methods that construct plans that take into account worst-case behavior. In Sec. 2 we give a more detailed overview of both classes of approaches. In this paper, we are motivated by Vapnik's principle: "when solving a given problem, try to avoid solving an even harder problem as an intermediate step". Constructing accurate models of external disturbances and the dynamics of humans or other agents is perhaps more complicated than the task of obstacle avoidance in motion planning. The uncertainty in these dynamics rarely conform to the assumptions made by the two classes of approaches highlighted above: external disturbances and human motion are typically neither fully stochastic nor fully adversarial. This motivates the need for online learning methods for adaptive non-stochastic control to avoid obstacles as they are perceived by the robot's sensors.

Statement of Contributions.

In this work, we pose the problem of obstacle avoidance in a regret minimization framework and build on techniques from non-stochastic control. Our primary contribution is a gradient-based online learning algorithm for the task of obstacle avoidance, coupled with provable regret bounds that show our obstacle avoidance policy to be comparable to the best policy in hindsight from a given class of closed-loop policies. This type of theoretical performance guarantee is nonstandard, and allows us to flexibly adapt to the behavior of the uncertainty in any instance of the obstacle avoidance problem without making a priori assumptions about whether the uncertainty is stochastic or adversarial. In addition, the resulting method is computationally efficient. The method is applied to experiments with complex (and dynamic) obstacle environments, and demonstrates improved performance in respective instances where open-loop planners and overly-robust methods can struggle.

2. RELATED WORK

Our results apply techniques from online learning to the problem of obstacle avoidance in order to provide efficiently computable safety guarantees. Both online learning and safe motion planning

